本文整理汇总了Java中weka.core.tokenizers.WordTokenizer类的典型用法代码示例。如果您正苦于以下问题:Java WordTokenizer类的具体用法?Java WordTokenizer怎么用?Java WordTokenizer使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
WordTokenizer类属于weka.core.tokenizers包,在下文中一共展示了WordTokenizer类的6个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。
示例1: getWordFilter
import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
public static StringToWordVector getWordFilter(Instances data, ClassifierStructure struct) {
StringToWordVector filter = new StringToWordVector();
try {
filter.setWordsToKeep(20000000);
WordTokenizer token = new WordTokenizer();
filter.setTokenizer(token);
filter.setInputFormat(data);
} catch (Exception e1) {
e1.printStackTrace();
}
return filter;
}
开发者ID:haneev,项目名称:TweetRetriever,代码行数:15,代码来源:LiveClassifier.java
示例2: listOptions
import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
/**
* Returns an enumeration describing the available options.
*
* @return an enumeration of all the available options.
*/
@Override
public Enumeration<Option> listOptions() {
Vector<Option> newVector = new Vector<Option>();
newVector.add(new Option("\tUse word frequencies instead of "
+ "binary bag of words.", "W", 0, "-W"));
newVector.add(new Option("\tHow often to prune the dictionary "
+ "of low frequency words (default = 0, i.e. don't prune)", "P", 1,
"-P <# instances>"));
newVector.add(new Option("\tMinimum word frequency. Words with less "
+ "than this frequence are ignored.\n\tIf periodic pruning "
+ "is turned on then this is also used to determine which\n\t"
+ "words to remove from the dictionary (default = 3).", "M", 1,
"-M <double>"));
newVector.addElement(new Option(
"\tNormalize document length (use in conjunction with -norm and "
+ "-lnorm)", "normalize", 0, "-normalize"));
newVector.addElement(new Option(
"\tSpecify the norm that each instance must have (default 1.0)", "norm",
1, "-norm <num>"));
newVector.addElement(
new Option("\tSpecify L-norm to use (default 2.0)", "lnorm", 1,
"-lnorm <num>"));
newVector.addElement(new Option("\tConvert all tokens to lowercase "
+ "before adding to the dictionary.", "lowercase", 0, "-lowercase"));
newVector.addElement(
new Option("\tThe stopwords handler to use (default Null).",
"-stopwords-handler", 1, "-stopwords-handler"));
newVector.addElement(new Option(
"\tThe tokenizing algorihtm (classname plus parameters) to use.\n"
+ "\t(default: " + WordTokenizer.class.getName() + ")", "tokenizer", 1,
"-tokenizer <spec>"));
newVector.addElement(new Option(
"\tThe stemmering algorihtm (classname plus parameters) to use.",
"stemmer", 1, "-stemmer <spec>"));
newVector.addAll(Collections.list(super.listOptions()));
return newVector.elements();
}
开发者ID:mydzigear,项目名称:repo.kmeanspp.silhouette_score,代码行数:47,代码来源:NaiveBayesMultinomialText.java
示例3: listOptions
import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
/**
* Returns an enumeration describing the available options.
*
* @return an enumeration of all the available options.
*/
@Override
public Enumeration<Option> listOptions() {
Vector<Option> newVector = new Vector<Option>();
newVector.add(new Option("\tSet the loss function to minimize. 0 = "
+ "hinge loss (SVM), 1 = log loss (logistic regression)\n\t"
+ "(default = 0)", "F", 1, "-F"));
newVector
.add(new Option("\tOutput probabilities for SVMs (fits a logsitic\n\t"
+ "model to the output of the SVM)", "output-probs", 0, "-outputProbs"));
newVector.add(new Option("\tThe learning rate (default = 0.01).", "L", 1,
"-L"));
newVector.add(new Option("\tThe lambda regularization constant "
+ "(default = 0.0001)", "R", 1, "-R <double>"));
newVector.add(new Option("\tThe number of epochs to perform ("
+ "batch learning only, default = 500)", "E", 1, "-E <integer>"));
newVector.add(new Option("\tUse word frequencies instead of "
+ "binary bag of words.", "W", 0, "-W"));
newVector.add(new Option("\tHow often to prune the dictionary "
+ "of low frequency words (default = 0, i.e. don't prune)", "P", 1,
"-P <# instances>"));
newVector.add(new Option("\tMinimum word frequency. Words with less "
+ "than this frequence are ignored.\n\tIf periodic pruning "
+ "is turned on then this is also used to determine which\n\t"
+ "words to remove from the dictionary (default = 3).", "M", 1,
"-M <double>"));
newVector.add(new Option("\tMinimum absolute value of coefficients " +
"in the model.\n\tIf periodic pruning is turned on then this\n\t"
+ "is also used to prune words from the dictionary\n\t"
+ "(default = 0.001", "min-coeff", 1, "-min-coeff <double>"));
newVector.addElement(new Option(
"\tNormalize document length (use in conjunction with -norm and "
+ "-lnorm)", "normalize", 0, "-normalize"));
newVector.addElement(new Option(
"\tSpecify the norm that each instance must have (default 1.0)", "norm",
1, "-norm <num>"));
newVector.addElement(new Option("\tSpecify L-norm to use (default 2.0)",
"lnorm", 1, "-lnorm <num>"));
newVector.addElement(new Option("\tConvert all tokens to lowercase "
+ "before adding to the dictionary.", "lowercase", 0, "-lowercase"));
newVector.addElement(new Option(
"\tThe stopwords handler to use (default Null).",
"-stopwords-handler", 1, "-stopwords-handler"));
newVector.addElement(new Option(
"\tThe tokenizing algorihtm (classname plus parameters) to use.\n"
+ "\t(default: " + WordTokenizer.class.getName() + ")", "tokenizer", 1,
"-tokenizer <spec>"));
newVector.addElement(new Option(
"\tThe stemmering algorihtm (classname plus parameters) to use.",
"stemmer", 1, "-stemmer <spec>"));
newVector.addAll(Collections.list(super.listOptions()));
return newVector.elements();
}
开发者ID:mydzigear,项目名称:repo.kmeanspp.silhouette_score,代码行数:63,代码来源:SGDText.java
示例4: listOptions
import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
/**
* Returns an enumeration describing the available options.
*
* @return an enumeration of all the available options.
*/
public Enumeration<Option> listOptions() {
Vector<Option> newVector = new Vector<Option>();
newVector.add(new Option("\tUse word frequencies instead of " +
"binary bag of words.", "W", 0,
"-W"));
newVector.add(new Option("\tHow often to prune the dictionary " +
"of low frequency words (default = 0, i.e. don't prune)",
"P", 1, "-P <# instances>"));
newVector.add(new Option("\tMinimum word frequency. Words with less " +
"than this frequence are ignored.\n\tIf periodic pruning " +
"is turned on then this is also used to determine which\n\t" +
"words to remove from the dictionary (default = 3).",
"M", 1, "-M <double>"));
newVector.addElement(new Option(
"\tNormalize document length (use in conjunction with -norm and " +
"-lnorm)", "normalize", 0, "-normalize"));
newVector.addElement(new Option(
"\tSpecify the norm that each instance must have (default 1.0)",
"norm", 1, "-norm <num>"));
newVector.addElement(new Option(
"\tSpecify L-norm to use (default 2.0)",
"lnorm", 1, "-lnorm <num>"));
newVector.addElement(new Option("\tConvert all tokens to lowercase " +
"before adding to the dictionary.",
"lowercase", 0, "-lowercase"));
newVector.addElement(new Option(
"\tIgnore words that are in the stoplist.",
"stoplist", 0, "-stoplist"));
newVector.addElement(new Option(
"\tA file containing stopwords to override the default ones.\n"
+ "\tUsing this option automatically sets the flag ('-stoplist') to use the\n"
+ "\tstoplist if the file exists.\n"
+ "\tFormat: one stopword per line, lines starting with '#'\n"
+ "\tare interpreted as comments and ignored.",
"stopwords", 1, "-stopwords <file>"));
newVector.addElement(new Option(
"\tThe tokenizing algorihtm (classname plus parameters) to use.\n"
+ "\t(default: " + WordTokenizer.class.getName() + ")",
"tokenizer", 1, "-tokenizer <spec>"));
newVector.addElement(new Option(
"\tThe stemmering algorihtm (classname plus parameters) to use.",
"stemmer", 1, "-stemmer <spec>"));
return newVector.elements();
}
开发者ID:dsibournemouth,项目名称:autoweka,代码行数:53,代码来源:NaiveBayesMultinomialText.java
示例5: listOptions
import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
/**
* Returns an enumeration describing the available options.
*
* @return an enumeration of all the available options.
*/
public Enumeration<Option> listOptions() {
Vector<Option> newVector = new Vector<Option>();
newVector.add(new Option("\tSet the loss function to minimize. 0 = " +
"hinge loss (SVM), 1 = log loss (logistic regression)\n\t" +
"(default = 0)", "F", 1, "-F"));
newVector.add(new Option("\tOutput probabilities for SVMs (fits a logsitic\n\t" +
"model to the output of the SVM)", "output-probs",
0, "-outputProbs"));
newVector.add(new Option("\tThe learning rate (default = 0.01).", "L", 1, "-L"));
newVector.add(new Option("\tThe lambda regularization constant " +
"(default = 0.0001)",
"R", 1, "-R <double>"));
newVector.add(new Option("\tThe number of epochs to perform (" +
"batch learning only, default = 500)", "E", 1,
"-E <integer>"));
newVector.add(new Option("\tUse word frequencies instead of " +
"binary bag of words.", "W", 0,
"-W"));
newVector.add(new Option("\tHow often to prune the dictionary " +
"of low frequency words (default = 0, i.e. don't prune)",
"P", 1, "-P <# instances>"));
newVector.add(new Option("\tMinimum word frequency. Words with less " +
"than this frequence are ignored.\n\tIf periodic pruning " +
"is turned on then this is also used to determine which\n\t" +
"words to remove from the dictionary (default = 3).",
"M", 1, "-M <double>"));
newVector.addElement(new Option(
"\tNormalize document length (use in conjunction with -norm and " +
"-lnorm)", "normalize", 0, "-normalize"));
newVector.addElement(new Option(
"\tSpecify the norm that each instance must have (default 1.0)",
"norm", 1, "-norm <num>"));
newVector.addElement(new Option(
"\tSpecify L-norm to use (default 2.0)",
"lnorm", 1, "-lnorm <num>"));
newVector.addElement(new Option("\tConvert all tokens to lowercase " +
"before adding to the dictionary.",
"lowercase", 0, "-lowercase"));
newVector.addElement(new Option(
"\tIgnore words that are in the stoplist.",
"stoplist", 0, "-stoplist"));
newVector.addElement(new Option(
"\tA file containing stopwords to override the default ones.\n"
+ "\tUsing this option automatically sets the flag ('-stoplist') to use the\n"
+ "\tstoplist if the file exists.\n"
+ "\tFormat: one stopword per line, lines starting with '#'\n"
+ "\tare interpreted as comments and ignored.",
"stopwords", 1, "-stopwords <file>"));
newVector.addElement(new Option(
"\tThe tokenizing algorihtm (classname plus parameters) to use.\n"
+ "\t(default: " + WordTokenizer.class.getName() + ")",
"tokenizer", 1, "-tokenizer <spec>"));
newVector.addElement(new Option(
"\tThe stemmering algorihtm (classname plus parameters) to use.",
"stemmer", 1, "-stemmer <spec>"));
return newVector.elements();
}
开发者ID:dsibournemouth,项目名称:autoweka,代码行数:65,代码来源:SGDText.java
示例6: listOptions
import weka.core.tokenizers.WordTokenizer; //导入依赖的package包/类
/**
* Returns an enumeration describing the available options.
*
* @return an enumeration of all the available options.
*/
@Override
public Enumeration<Option> listOptions() {
Vector<Option> newVector = new Vector<Option>();
newVector.add(new Option("\tUse word frequencies instead of "
+ "binary bag of words.", "W", 0, "-W"));
newVector.add(new Option("\tHow often to prune the dictionary "
+ "of low frequency words (default = 0, i.e. don't prune)", "P", 1,
"-P <# instances>"));
newVector.add(new Option("\tMinimum word frequency. Words with less "
+ "than this frequence are ignored.\n\tIf periodic pruning "
+ "is turned on then this is also used to determine which\n\t"
+ "words to remove from the dictionary (default = 3).", "M", 1,
"-M <double>"));
newVector.addElement(new Option(
"\tNormalize document length (use in conjunction with -norm and "
+ "-lnorm)", "normalize", 0, "-normalize"));
newVector.addElement(new Option(
"\tSpecify the norm that each instance must have (default 1.0)", "norm",
1, "-norm <num>"));
newVector.addElement(new Option("\tSpecify L-norm to use (default 2.0)",
"lnorm", 1, "-lnorm <num>"));
newVector.addElement(new Option("\tConvert all tokens to lowercase "
+ "before adding to the dictionary.", "lowercase", 0, "-lowercase"));
newVector.addElement(new Option(
"\tThe stopwords handler to use (default Null).",
"-stopwords-handler", 1, "-stopwords-handler"));
newVector.addElement(new Option(
"\tThe tokenizing algorihtm (classname plus parameters) to use.\n"
+ "\t(default: " + WordTokenizer.class.getName() + ")", "tokenizer", 1,
"-tokenizer <spec>"));
newVector.addElement(new Option(
"\tThe stemmering algorihtm (classname plus parameters) to use.",
"stemmer", 1, "-stemmer <spec>"));
newVector.addAll(Collections.list(super.listOptions()));
return newVector.elements();
}
开发者ID:umple,项目名称:umple,代码行数:46,代码来源:NaiveBayesMultinomialText.java
注:本文中的weka.core.tokenizers.WordTokenizer类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。 |
请发表评论