Java WordTokenizer类代码示例

OStack程序员社区-中国程序员成长平台 › 门户 › 编程› Java›Java编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Java中weka.core.tokenizers.WordTokenizer类的典型用法代码示例。如果您正苦于以下问题：Java WordTokenizer类的具体用法？Java WordTokenizer怎么用？Java WordTokenizer使用的例子？那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。

WordTokenizer类属于weka.core.tokenizers包，在下文中一共展示了WordTokenizer类的6个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: getWordFilter

import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
public static StringToWordVector getWordFilter(Instances data, ClassifierStructure struct) {
	StringToWordVector filter = new StringToWordVector();
	
    try {
		filter.setWordsToKeep(20000000);
		WordTokenizer token = new WordTokenizer();
		filter.setTokenizer(token);
		filter.setInputFormat(data); 
	} catch (Exception e1) {
		e1.printStackTrace();
	}
    
    return filter;
}

开发者ID:haneev，项目名称:TweetRetriever，代码行数:15，代码来源:LiveClassifier.java

示例2: listOptions

import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
/**
 * Returns an enumeration describing the available options.
 *
 * @return an enumeration of all the available options.
 */
@Override
public Enumeration<Option> listOptions() {

  Vector<Option> newVector = new Vector<Option>();

  newVector.add(new Option("\tUse word frequencies instead of "
    + "binary bag of words.", "W", 0, "-W"));
  newVector.add(new Option("\tHow often to prune the dictionary "
    + "of low frequency words (default = 0, i.e. don't prune)", "P", 1,
    "-P <# instances>"));
  newVector.add(new Option("\tMinimum word frequency. Words with less "
    + "than this frequence are ignored.\n\tIf periodic pruning "
    + "is turned on then this is also used to determine which\n\t"
    + "words to remove from the dictionary (default = 3).", "M", 1,
    "-M <double>"));
  newVector.addElement(new Option(
    "\tNormalize document length (use in conjunction with -norm and "
      + "-lnorm)", "normalize", 0, "-normalize"));
  newVector.addElement(new Option(
    "\tSpecify the norm that each instance must have (default 1.0)", "norm",
    1, "-norm <num>"));
  newVector.addElement(
    new Option("\tSpecify L-norm to use (default 2.0)", "lnorm", 1,
      "-lnorm <num>"));
  newVector.addElement(new Option("\tConvert all tokens to lowercase "
    + "before adding to the dictionary.", "lowercase", 0, "-lowercase"));
  newVector.addElement(
    new Option("\tThe stopwords handler to use (default Null).",
      "-stopwords-handler", 1, "-stopwords-handler"));
  newVector.addElement(new Option(
    "\tThe tokenizing algorihtm (classname plus parameters) to use.\n"
      + "\t(default: " + WordTokenizer.class.getName() + ")", "tokenizer", 1,
    "-tokenizer <spec>"));
  newVector.addElement(new Option(
    "\tThe stemmering algorihtm (classname plus parameters) to use.",
    "stemmer", 1, "-stemmer <spec>"));

  newVector.addAll(Collections.list(super.listOptions()));

  return newVector.elements();
}

开发者ID:mydzigear，项目名称:repo.kmeanspp.silhouette_score，代码行数:47，代码来源:NaiveBayesMultinomialText.java

示例3: listOptions

import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
/**
 * Returns an enumeration describing the available options.
 * 
 * @return an enumeration of all the available options.
 */
@Override
public Enumeration<Option> listOptions() {

  Vector<Option> newVector = new Vector<Option>();
  newVector.add(new Option("\tSet the loss function to minimize. 0 = "
    + "hinge loss (SVM), 1 = log loss (logistic regression)\n\t"
    + "(default = 0)", "F", 1, "-F"));
  newVector
    .add(new Option("\tOutput probabilities for SVMs (fits a logsitic\n\t"
      + "model to the output of the SVM)", "output-probs", 0, "-outputProbs"));
  newVector.add(new Option("\tThe learning rate (default = 0.01).", "L", 1,
    "-L"));
  newVector.add(new Option("\tThe lambda regularization constant "
    + "(default = 0.0001)", "R", 1, "-R <double>"));
  newVector.add(new Option("\tThe number of epochs to perform ("
    + "batch learning only, default = 500)", "E", 1, "-E <integer>"));
  newVector.add(new Option("\tUse word frequencies instead of "
    + "binary bag of words.", "W", 0, "-W"));
  newVector.add(new Option("\tHow often to prune the dictionary "
    + "of low frequency words (default = 0, i.e. don't prune)", "P", 1,
    "-P <# instances>"));
  newVector.add(new Option("\tMinimum word frequency. Words with less "
    + "than this frequence are ignored.\n\tIf periodic pruning "
    + "is turned on then this is also used to determine which\n\t"
    + "words to remove from the dictionary (default = 3).", "M", 1,
    "-M <double>"));

  newVector.add(new Option("\tMinimum absolute value of coefficients " +
    "in the model.\n\tIf periodic pruning is turned on then this\n\t"
    + "is also used to prune words from the dictionary\n\t"
    + "(default = 0.001", "min-coeff", 1, "-min-coeff <double>"));

  newVector.addElement(new Option(
    "\tNormalize document length (use in conjunction with -norm and "
      + "-lnorm)", "normalize", 0, "-normalize"));
  newVector.addElement(new Option(
    "\tSpecify the norm that each instance must have (default 1.0)", "norm",
    1, "-norm <num>"));
  newVector.addElement(new Option("\tSpecify L-norm to use (default 2.0)",
    "lnorm", 1, "-lnorm <num>"));
  newVector.addElement(new Option("\tConvert all tokens to lowercase "
    + "before adding to the dictionary.", "lowercase", 0, "-lowercase"));
  newVector.addElement(new Option(
    "\tThe stopwords handler to use (default Null).",
    "-stopwords-handler", 1, "-stopwords-handler"));
  newVector.addElement(new Option(
    "\tThe tokenizing algorihtm (classname plus parameters) to use.\n"
      + "\t(default: " + WordTokenizer.class.getName() + ")", "tokenizer", 1,
    "-tokenizer <spec>"));
  newVector.addElement(new Option(
    "\tThe stemmering algorihtm (classname plus parameters) to use.",
    "stemmer", 1, "-stemmer <spec>"));

  newVector.addAll(Collections.list(super.listOptions()));

  return newVector.elements();
}

开发者ID:mydzigear，项目名称:repo.kmeanspp.silhouette_score，代码行数:63，代码来源:SGDText.java

示例4: listOptions

import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
/**
 * Returns an enumeration describing the available options.
 *
 * @return an enumeration of all the available options.
 */
public Enumeration<Option> listOptions() {

  Vector<Option> newVector = new Vector<Option>();

  newVector.add(new Option("\tUse word frequencies instead of " +
              "binary bag of words.", "W", 0, 
              "-W"));
  newVector.add(new Option("\tHow often to prune the dictionary " +
              "of low frequency words (default = 0, i.e. don't prune)", 
              "P", 1, "-P <# instances>"));
  newVector.add(new Option("\tMinimum word frequency. Words with less " +
              "than this frequence are ignored.\n\tIf periodic pruning " +
              "is turned on then this is also used to determine which\n\t" +
              "words to remove from the dictionary (default = 3).",
              "M", 1, "-M <double>"));
  newVector.addElement(new Option(
      "\tNormalize document length (use in conjunction with -norm and " +
      "-lnorm)", "normalize", 0, "-normalize"));
  newVector.addElement(new Option(
      "\tSpecify the norm that each instance must have (default 1.0)",
      "norm", 1, "-norm <num>"));
  newVector.addElement(new Option(
      "\tSpecify L-norm to use (default 2.0)",
      "lnorm", 1, "-lnorm <num>"));
  newVector.addElement(new Option("\tConvert all tokens to lowercase " +
              "before adding to the dictionary.",
      "lowercase", 0, "-lowercase"));
  newVector.addElement(new Option(
      "\tIgnore words that are in the stoplist.",
      "stoplist", 0, "-stoplist"));
  newVector.addElement(new Option(
      "\tA file containing stopwords to override the default ones.\n"
      + "\tUsing this option automatically sets the flag ('-stoplist') to use the\n"
      + "\tstoplist if the file exists.\n"
      + "\tFormat: one stopword per line, lines starting with '#'\n"
      + "\tare interpreted as comments and ignored.",
      "stopwords", 1, "-stopwords <file>"));
  newVector.addElement(new Option(
      "\tThe tokenizing algorihtm (classname plus parameters) to use.\n"
      + "\t(default: " + WordTokenizer.class.getName() + ")",
      "tokenizer", 1, "-tokenizer <spec>"));
  newVector.addElement(new Option(
      "\tThe stemmering algorihtm (classname plus parameters) to use.",
      "stemmer", 1, "-stemmer <spec>"));
  
  return newVector.elements();
}

开发者ID:dsibournemouth，项目名称:autoweka，代码行数:53，代码来源:NaiveBayesMultinomialText.java

示例5: listOptions

import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
/**
 * Returns an enumeration describing the available options.
 *
 * @return an enumeration of all the available options.
 */
public Enumeration<Option> listOptions() {

  Vector<Option> newVector = new Vector<Option>();
  newVector.add(new Option("\tSet the loss function to minimize. 0 = " +
      "hinge loss (SVM), 1 = log loss (logistic regression)\n\t" +
      "(default = 0)", "F", 1, "-F"));
  newVector.add(new Option("\tOutput probabilities for SVMs (fits a logsitic\n\t" +
  		"model to the output of the SVM)", "output-probs", 
      0, "-outputProbs"));
  newVector.add(new Option("\tThe learning rate (default = 0.01).", "L", 1, "-L"));
  newVector.add(new Option("\tThe lambda regularization constant " +
              "(default = 0.0001)",
              "R", 1, "-R <double>"));
  newVector.add(new Option("\tThe number of epochs to perform (" +
              "batch learning only, default = 500)", "E", 1,
              "-E <integer>"));
  newVector.add(new Option("\tUse word frequencies instead of " +
  		"binary bag of words.", "W", 0, 
  		"-W"));
  newVector.add(new Option("\tHow often to prune the dictionary " +
  		"of low frequency words (default = 0, i.e. don't prune)", 
  		"P", 1, "-P <# instances>"));
  newVector.add(new Option("\tMinimum word frequency. Words with less " +
  		"than this frequence are ignored.\n\tIf periodic pruning " +
  		"is turned on then this is also used to determine which\n\t" +
  		"words to remove from the dictionary (default = 3).",
  		"M", 1, "-M <double>"));
  newVector.addElement(new Option(
      "\tNormalize document length (use in conjunction with -norm and " +
      "-lnorm)", "normalize", 0, "-normalize"));
  newVector.addElement(new Option(
      "\tSpecify the norm that each instance must have (default 1.0)",
      "norm", 1, "-norm <num>"));
  newVector.addElement(new Option(
      "\tSpecify L-norm to use (default 2.0)",
      "lnorm", 1, "-lnorm <num>"));
  newVector.addElement(new Option("\tConvert all tokens to lowercase " +
  		"before adding to the dictionary.",
      "lowercase", 0, "-lowercase"));
  newVector.addElement(new Option(
      "\tIgnore words that are in the stoplist.",
      "stoplist", 0, "-stoplist"));
  newVector.addElement(new Option(
      "\tA file containing stopwords to override the default ones.\n"
      + "\tUsing this option automatically sets the flag ('-stoplist') to use the\n"
      + "\tstoplist if the file exists.\n"
      + "\tFormat: one stopword per line, lines starting with '#'\n"
      + "\tare interpreted as comments and ignored.",
      "stopwords", 1, "-stopwords <file>"));
  newVector.addElement(new Option(
      "\tThe tokenizing algorihtm (classname plus parameters) to use.\n"
      + "\t(default: " + WordTokenizer.class.getName() + ")",
      "tokenizer", 1, "-tokenizer <spec>"));
  newVector.addElement(new Option(
      "\tThe stemmering algorihtm (classname plus parameters) to use.",
      "stemmer", 1, "-stemmer <spec>"));
  
  return newVector.elements();
}

开发者ID:dsibournemouth，项目名称:autoweka，代码行数:65，代码来源:SGDText.java

示例6: listOptions

import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
/**
 * Returns an enumeration describing the available options.
 * 
 * @return an enumeration of all the available options.
 */
@Override
public Enumeration<Option> listOptions() {

  Vector<Option> newVector = new Vector<Option>();

  newVector.add(new Option("\tUse word frequencies instead of "
    + "binary bag of words.", "W", 0, "-W"));
  newVector.add(new Option("\tHow often to prune the dictionary "
    + "of low frequency words (default = 0, i.e. don't prune)", "P", 1,
    "-P <# instances>"));
  newVector.add(new Option("\tMinimum word frequency. Words with less "
    + "than this frequence are ignored.\n\tIf periodic pruning "
    + "is turned on then this is also used to determine which\n\t"
    + "words to remove from the dictionary (default = 3).", "M", 1,
    "-M <double>"));
  newVector.addElement(new Option(
    "\tNormalize document length (use in conjunction with -norm and "
      + "-lnorm)", "normalize", 0, "-normalize"));
  newVector.addElement(new Option(
    "\tSpecify the norm that each instance must have (default 1.0)", "norm",
    1, "-norm <num>"));
  newVector.addElement(new Option("\tSpecify L-norm to use (default 2.0)",
    "lnorm", 1, "-lnorm <num>"));
  newVector.addElement(new Option("\tConvert all tokens to lowercase "
    + "before adding to the dictionary.", "lowercase", 0, "-lowercase"));
  newVector.addElement(new Option(
    "\tThe stopwords handler to use (default Null).",
    "-stopwords-handler", 1, "-stopwords-handler"));
  newVector.addElement(new Option(
    "\tThe tokenizing algorihtm (classname plus parameters) to use.\n"
      + "\t(default: " + WordTokenizer.class.getName() + ")", "tokenizer", 1,
    "-tokenizer <spec>"));
  newVector.addElement(new Option(
    "\tThe stemmering algorihtm (classname plus parameters) to use.",
    "stemmer", 1, "-stemmer <spec>"));

  newVector.addAll(Collections.list(super.listOptions()));

  return newVector.elements();
}

开发者ID:umple，项目名称:umple，代码行数:46，代码来源:NaiveBayesMultinomialText.java

注：本文中的weka.core.tokenizers.WordTokenizer类示例整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Java Composite类代码示例发布时间：2022-05-23

Java ReactQueueConfigurationSpec类代码示例发布时间：2022-05-23

剪的笔顺,诠释剪的笔画,认识剪的部首

1 六六分期app的软件客服如何联系？(六六分期

六六分期app的软件客服如何联系？不知道吗？加qq群【895510560】即可！标题：六六分期

阅读：18986|2023-10-27

2 可心卡盟:win10系统火狐flash插件崩溃怎么

今天小编告诉大家如何处理win10系统火狐flash插件总是崩溃的问题，可能很多用户都不知

阅读：9923|2022-11-06

3 亲亲特价:怎么删除回收站图标

今天小编告诉大家如何对win10系统删除桌面回收站图标进行设置，可能很多用户都不知道

阅读：8303|2022-11-06

4 济南大学虚拟社区:鲁大师节能降温的具体办

今天小编告诉大家如何对win10系统电脑设置节能降温的设置方法，想必大家都遇到过需要

阅读：8668|2022-11-06

5 xlueops.exe:无线网络安装向导

我们在使用xp系统的过程中,经常需要对xp系统无线网络安装向导设置进行设置，可能很多

阅读：8600|2022-11-06

6 女斗合众国:win7系统cf与主机连接不稳定怎

今天小编告诉大家如何处理win7系统玩cf老是与主机连接不稳定的问题，可能很多用户都不

阅读：9612|2022-11-06

7 0xc000022-[cf烟雾头]cf怎么调烟雾头

电脑对日常生活的重要性小编就不多说了，可是一旦碰到win7系统设置cf烟雾头的问题，很

阅读：8591|2022-11-06

8 qizideyouhuo:应用程序无法正常启动0xc0000

我们在日常使用电脑的时候，有的小伙伴们可能在打开应用的时候会遇见提示应用程序无法

阅读：7976|2022-11-06

9 ipz-185:win7系统vcf文件怎么打开

今天小编告诉大家如何对win7系统打开vcf文件进行设置，可能很多用户都不知道怎么对win

阅读：8605|2022-11-06

10 傻哥蹦迪:win10系统s4怎么打开usb调试

今天小编告诉大家如何对win10系统s4开启USB调试模式进行设置，可能很多用户都不知道怎

阅读：7517|2022-11-06

客服电话

电子邮件

Java WordTokenizer类代码示例

示例1: getWordFilter

示例2: listOptions

示例3: listOptions

示例4: listOptions

示例5: listOptions

示例6: listOptions

请发表评论

全部评论

上一篇：

下一篇：

krishnaik06/Machine-Learning-in-90-days

armancodv/building-energy-model-matlab:

美元符号为什么是“$”

FGRibreau/import-tweets-to-mastodon: How

微信小程序图表工具wx-charts

剪的笔顺,诠释剪的笔画,认识剪的部首

六六分期app的软件客服如何联系？(六六分期

florent37/ViewAnimator: A fluent Android

florent37/Shrine-MaterialDesign2: implem

CVE-2020-36276

SimpleSoftwareIO/simple-sms: Send and re

关于我们

产品与服务

解决方案

139-2527-9053