• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    公众号

Java WordTokenizer类代码示例

原作者: [db:作者] 来自: [db:来源] 收藏 邀请

本文整理汇总了Java中weka.core.tokenizers.WordTokenizer的典型用法代码示例。如果您正苦于以下问题:Java WordTokenizer类的具体用法?Java WordTokenizer怎么用?Java WordTokenizer使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。



WordTokenizer类属于weka.core.tokenizers包,在下文中一共展示了WordTokenizer类的6个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: getWordFilter

import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
public static StringToWordVector getWordFilter(Instances data, ClassifierStructure struct) {
	StringToWordVector filter = new StringToWordVector();
	
    try {
		filter.setWordsToKeep(20000000);
		WordTokenizer token = new WordTokenizer();
		filter.setTokenizer(token);
		filter.setInputFormat(data); 
	} catch (Exception e1) {
		e1.printStackTrace();
	}
    
    return filter;
}
 
开发者ID:haneev,项目名称:TweetRetriever,代码行数:15,代码来源:LiveClassifier.java


示例2: listOptions

import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
/**
 * Returns an enumeration describing the available options.
 *
 * @return an enumeration of all the available options.
 */
@Override
public Enumeration<Option> listOptions() {

  Vector<Option> newVector = new Vector<Option>();

  newVector.add(new Option("\tUse word frequencies instead of "
    + "binary bag of words.", "W", 0, "-W"));
  newVector.add(new Option("\tHow often to prune the dictionary "
    + "of low frequency words (default = 0, i.e. don't prune)", "P", 1,
    "-P <# instances>"));
  newVector.add(new Option("\tMinimum word frequency. Words with less "
    + "than this frequence are ignored.\n\tIf periodic pruning "
    + "is turned on then this is also used to determine which\n\t"
    + "words to remove from the dictionary (default = 3).", "M", 1,
    "-M <double>"));
  newVector.addElement(new Option(
    "\tNormalize document length (use in conjunction with -norm and "
      + "-lnorm)", "normalize", 0, "-normalize"));
  newVector.addElement(new Option(
    "\tSpecify the norm that each instance must have (default 1.0)", "norm",
    1, "-norm <num>"));
  newVector.addElement(
    new Option("\tSpecify L-norm to use (default 2.0)", "lnorm", 1,
      "-lnorm <num>"));
  newVector.addElement(new Option("\tConvert all tokens to lowercase "
    + "before adding to the dictionary.", "lowercase", 0, "-lowercase"));
  newVector.addElement(
    new Option("\tThe stopwords handler to use (default Null).",
      "-stopwords-handler", 1, "-stopwords-handler"));
  newVector.addElement(new Option(
    "\tThe tokenizing algorihtm (classname plus parameters) to use.\n"
      + "\t(default: " + WordTokenizer.class.getName() + ")", "tokenizer", 1,
    "-tokenizer <spec>"));
  newVector.addElement(new Option(
    "\tThe stemmering algorihtm (classname plus parameters) to use.",
    "stemmer", 1, "-stemmer <spec>"));

  newVector.addAll(Collections.list(super.listOptions()));

  return newVector.elements();
}
 
开发者ID:mydzigear,项目名称:repo.kmeanspp.silhouette_score,代码行数:47,代码来源:NaiveBayesMultinomialText.java


示例3: listOptions

import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
/**
 * Returns an enumeration describing the available options.
 * 
 * @return an enumeration of all the available options.
 */
@Override
public Enumeration<Option> listOptions() {

  Vector<Option> newVector = new Vector<Option>();
  newVector.add(new Option("\tSet the loss function to minimize. 0 = "
    + "hinge loss (SVM), 1 = log loss (logistic regression)\n\t"
    + "(default = 0)", "F", 1, "-F"));
  newVector
    .add(new Option("\tOutput probabilities for SVMs (fits a logsitic\n\t"
      + "model to the output of the SVM)", "output-probs", 0, "-outputProbs"));
  newVector.add(new Option("\tThe learning rate (default = 0.01).", "L", 1,
    "-L"));
  newVector.add(new Option("\tThe lambda regularization constant "
    + "(default = 0.0001)", "R", 1, "-R <double>"));
  newVector.add(new Option("\tThe number of epochs to perform ("
    + "batch learning only, default = 500)", "E", 1, "-E <integer>"));
  newVector.add(new Option("\tUse word frequencies instead of "
    + "binary bag of words.", "W", 0, "-W"));
  newVector.add(new Option("\tHow often to prune the dictionary "
    + "of low frequency words (default = 0, i.e. don't prune)", "P", 1,
    "-P <# instances>"));
  newVector.add(new Option("\tMinimum word frequency. Words with less "
    + "than this frequence are ignored.\n\tIf periodic pruning "
    + "is turned on then this is also used to determine which\n\t"
    + "words to remove from the dictionary (default = 3).", "M", 1,
    "-M <double>"));

  newVector.add(new Option("\tMinimum absolute value of coefficients " +
    "in the model.\n\tIf periodic pruning is turned on then this\n\t"
    + "is also used to prune words from the dictionary\n\t"
    + "(default = 0.001", "min-coeff", 1, "-min-coeff <double>"));

  newVector.addElement(new Option(
    "\tNormalize document length (use in conjunction with -norm and "
      + "-lnorm)", "normalize", 0, "-normalize"));
  newVector.addElement(new Option(
    "\tSpecify the norm that each instance must have (default 1.0)", "norm",
    1, "-norm <num>"));
  newVector.addElement(new Option("\tSpecify L-norm to use (default 2.0)",
    "lnorm", 1, "-lnorm <num>"));
  newVector.addElement(new Option("\tConvert all tokens to lowercase "
    + "before adding to the dictionary.", "lowercase", 0, "-lowercase"));
  newVector.addElement(new Option(
    "\tThe stopwords handler to use (default Null).",
    "-stopwords-handler", 1, "-stopwords-handler"));
  newVector.addElement(new Option(
    "\tThe tokenizing algorihtm (classname plus parameters) to use.\n"
      + "\t(default: " + WordTokenizer.class.getName() + ")", "tokenizer", 1,
    "-tokenizer <spec>"));
  newVector.addElement(new Option(
    "\tThe stemmering algorihtm (classname plus parameters) to use.",
    "stemmer", 1, "-stemmer <spec>"));

  newVector.addAll(Collections.list(super.listOptions()));

  return newVector.elements();
}
 
开发者ID:mydzigear,项目名称:repo.kmeanspp.silhouette_score,代码行数:63,代码来源:SGDText.java


示例4: listOptions

import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
/**
 * Returns an enumeration describing the available options.
 *
 * @return an enumeration of all the available options.
 */
public Enumeration<Option> listOptions() {

  Vector<Option> newVector = new Vector<Option>();

  newVector.add(new Option("\tUse word frequencies instead of " +
              "binary bag of words.", "W", 0, 
              "-W"));
  newVector.add(new Option("\tHow often to prune the dictionary " +
              "of low frequency words (default = 0, i.e. don't prune)", 
              "P", 1, "-P <# instances>"));
  newVector.add(new Option("\tMinimum word frequency. Words with less " +
              "than this frequence are ignored.\n\tIf periodic pruning " +
              "is turned on then this is also used to determine which\n\t" +
              "words to remove from the dictionary (default = 3).",
              "M", 1, "-M <double>"));
  newVector.addElement(new Option(
      "\tNormalize document length (use in conjunction with -norm and " +
      "-lnorm)", "normalize", 0, "-normalize"));
  newVector.addElement(new Option(
      "\tSpecify the norm that each instance must have (default 1.0)",
      "norm", 1, "-norm <num>"));
  newVector.addElement(new Option(
      "\tSpecify L-norm to use (default 2.0)",
      "lnorm", 1, "-lnorm <num>"));
  newVector.addElement(new Option("\tConvert all tokens to lowercase " +
              "before adding to the dictionary.",
      "lowercase", 0, "-lowercase"));
  newVector.addElement(new Option(
      "\tIgnore words that are in the stoplist.",
      "stoplist", 0, "-stoplist"));
  newVector.addElement(new Option(
      "\tA file containing stopwords to override the default ones.\n"
      + "\tUsing this option automatically sets the flag ('-stoplist') to use the\n"
      + "\tstoplist if the file exists.\n"
      + "\tFormat: one stopword per line, lines starting with '#'\n"
      + "\tare interpreted as comments and ignored.",
      "stopwords", 1, "-stopwords <file>"));
  newVector.addElement(new Option(
      "\tThe tokenizing algorihtm (classname plus parameters) to use.\n"
      + "\t(default: " + WordTokenizer.class.getName() + ")",
      "tokenizer", 1, "-tokenizer <spec>"));
  newVector.addElement(new Option(
      "\tThe stemmering algorihtm (classname plus parameters) to use.",
      "stemmer", 1, "-stemmer <spec>"));
  
  return newVector.elements();
}
 
开发者ID:dsibournemouth,项目名称:autoweka,代码行数:53,代码来源:NaiveBayesMultinomialText.java


示例5: listOptions

import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
/**
 * Returns an enumeration describing the available options.
 *
 * @return an enumeration of all the available options.
 */
public Enumeration<Option> listOptions() {

  Vector<Option> newVector = new Vector<Option>();
  newVector.add(new Option("\tSet the loss function to minimize. 0 = " +
      "hinge loss (SVM), 1 = log loss (logistic regression)\n\t" +
      "(default = 0)", "F", 1, "-F"));
  newVector.add(new Option("\tOutput probabilities for SVMs (fits a logsitic\n\t" +
  		"model to the output of the SVM)", "output-probs", 
      0, "-outputProbs"));
  newVector.add(new Option("\tThe learning rate (default = 0.01).", "L", 1, "-L"));
  newVector.add(new Option("\tThe lambda regularization constant " +
              "(default = 0.0001)",
              "R", 1, "-R <double>"));
  newVector.add(new Option("\tThe number of epochs to perform (" +
              "batch learning only, default = 500)", "E", 1,
              "-E <integer>"));
  newVector.add(new Option("\tUse word frequencies instead of " +
  		"binary bag of words.", "W", 0, 
  		"-W"));
  newVector.add(new Option("\tHow often to prune the dictionary " +
  		"of low frequency words (default = 0, i.e. don't prune)", 
  		"P", 1, "-P <# instances>"));
  newVector.add(new Option("\tMinimum word frequency. Words with less " +
  		"than this frequence are ignored.\n\tIf periodic pruning " +
  		"is turned on then this is also used to determine which\n\t" +
  		"words to remove from the dictionary (default = 3).",
  		"M", 1, "-M <double>"));
  newVector.addElement(new Option(
      "\tNormalize document length (use in conjunction with -norm and " +
      "-lnorm)", "normalize", 0, "-normalize"));
  newVector.addElement(new Option(
      "\tSpecify the norm that each instance must have (default 1.0)",
      "norm", 1, "-norm <num>"));
  newVector.addElement(new Option(
      "\tSpecify L-norm to use (default 2.0)",
      "lnorm", 1, "-lnorm <num>"));
  newVector.addElement(new Option("\tConvert all tokens to lowercase " +
  		"before adding to the dictionary.",
      "lowercase", 0, "-lowercase"));
  newVector.addElement(new Option(
      "\tIgnore words that are in the stoplist.",
      "stoplist", 0, "-stoplist"));
  newVector.addElement(new Option(
      "\tA file containing stopwords to override the default ones.\n"
      + "\tUsing this option automatically sets the flag ('-stoplist') to use the\n"
      + "\tstoplist if the file exists.\n"
      + "\tFormat: one stopword per line, lines starting with '#'\n"
      + "\tare interpreted as comments and ignored.",
      "stopwords", 1, "-stopwords <file>"));
  newVector.addElement(new Option(
      "\tThe tokenizing algorihtm (classname plus parameters) to use.\n"
      + "\t(default: " + WordTokenizer.class.getName() + ")",
      "tokenizer", 1, "-tokenizer <spec>"));
  newVector.addElement(new Option(
      "\tThe stemmering algorihtm (classname plus parameters) to use.",
      "stemmer", 1, "-stemmer <spec>"));
  
  return newVector.elements();
}
 
开发者ID:dsibournemouth,项目名称:autoweka,代码行数:65,代码来源:SGDText.java


示例6: listOptions

import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
/**
 * Returns an enumeration describing the available options.
 * 
 * @return an enumeration of all the available options.
 */
@Override
public Enumeration<Option> listOptions() {

  Vector<Option> newVector = new Vector<Option>();

  newVector.add(new Option("\tUse word frequencies instead of "
    + "binary bag of words.", "W", 0, "-W"));
  newVector.add(new Option("\tHow often to prune the dictionary "
    + "of low frequency words (default = 0, i.e. don't prune)", "P", 1,
    "-P <# instances>"));
  newVector.add(new Option("\tMinimum word frequency. Words with less "
    + "than this frequence are ignored.\n\tIf periodic pruning "
    + "is turned on then this is also used to determine which\n\t"
    + "words to remove from the dictionary (default = 3).", "M", 1,
    "-M <double>"));
  newVector.addElement(new Option(
    "\tNormalize document length (use in conjunction with -norm and "
      + "-lnorm)", "normalize", 0, "-normalize"));
  newVector.addElement(new Option(
    "\tSpecify the norm that each instance must have (default 1.0)", "norm",
    1, "-norm <num>"));
  newVector.addElement(new Option("\tSpecify L-norm to use (default 2.0)",
    "lnorm", 1, "-lnorm <num>"));
  newVector.addElement(new Option("\tConvert all tokens to lowercase "
    + "before adding to the dictionary.", "lowercase", 0, "-lowercase"));
  newVector.addElement(new Option(
    "\tThe stopwords handler to use (default Null).",
    "-stopwords-handler", 1, "-stopwords-handler"));
  newVector.addElement(new Option(
    "\tThe tokenizing algorihtm (classname plus parameters) to use.\n"
      + "\t(default: " + WordTokenizer.class.getName() + ")", "tokenizer", 1,
    "-tokenizer <spec>"));
  newVector.addElement(new Option(
    "\tThe stemmering algorihtm (classname plus parameters) to use.",
    "stemmer", 1, "-stemmer <spec>"));

  newVector.addAll(Collections.list(super.listOptions()));

  return newVector.elements();
}
 
开发者ID:umple,项目名称:umple,代码行数:46,代码来源:NaiveBayesMultinomialText.java



注:本文中的weka.core.tokenizers.WordTokenizer类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
Java Composite类代码示例发布时间:2022-05-23
下一篇:
Java ReactQueueConfigurationSpec类代码示例发布时间:2022-05-23
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap