本文整理汇总了Java中org.apache.lucene.analysis.miscellaneous.LengthFilter类的典型用法代码示例。如果您正苦于以下问题:Java LengthFilter类的具体用法?Java LengthFilter怎么用?Java LengthFilter使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
LengthFilter类属于org.apache.lucene.analysis.miscellaneous包,在下文中一共展示了LengthFilter类的6个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。
示例1: create
import org.apache.lucene.analysis.miscellaneous.LengthFilter; //导入依赖的package包/类
@Override
public TokenStream create(TokenStream tokenStream) {
if (version.onOrAfter(Version.LUCENE_4_4)) {
return new LengthFilter(tokenStream, min, max);
} else {
@SuppressWarnings("deprecation")
final TokenStream filter = new Lucene43LengthFilter(enablePositionIncrements, tokenStream, min, max);
return filter;
}
}
开发者ID:baidu,项目名称:Elasticsearch,代码行数:11,代码来源:LengthTokenFilterFactory.java
示例2: transform
import org.apache.lucene.analysis.miscellaneous.LengthFilter; //导入依赖的package包/类
public Tuple2<Double, Multiset<String>> transform(Row row) throws IOException {
Double label = row.getDouble(1);
StringReader document = new StringReader(row.getString(0).replaceAll("br2n", ""));
List<String> wordsList = new ArrayList<>();
try (BulgarianAnalyzer analyzer = new BulgarianAnalyzer(BULGARIAN_STOP_WORDS_SET)) {
TokenStream stream = analyzer.tokenStream("words", document);
TokenFilter lowerFilter = new LowerCaseFilter(stream);
TokenFilter numbers = new NumberFilter(lowerFilter);
TokenFilter length = new LengthFilter(numbers, 3, 1000);
TokenFilter stemmer = new BulgarianStemFilter(length);
TokenFilter ngrams = new ShingleFilter(stemmer, 2, 3);
try (TokenFilter filter = ngrams) {
Attribute termAtt = filter.addAttribute(CharTermAttribute.class);
filter.reset();
while (filter.incrementToken()) {
String word = termAtt.toString().replace(",", "(comma)").replaceAll("\n|\r", "");
if (word.contains("_")) {
continue;
}
wordsList.add(word);
}
}
}
Multiset<String> words = ConcurrentHashMultiset.create(wordsList);
return new Tuple2<>(label, words);
}
开发者ID:mhardalov,项目名称:news-credibility,代码行数:32,代码来源:TokenTransform.java
示例3: main
import org.apache.lucene.analysis.miscellaneous.LengthFilter; //导入依赖的package包/类
public static void main(String[] args) throws IOException {
System.out.println(NumberUtils.isDigits("12345"));
System.out.println(NumberUtils.isDigits("12345.1"));
System.out.println(NumberUtils.isDigits("12345,2"));
System.out.println(NumberUtils.isNumber("12345"));
System.out.println(NumberUtils.isNumber("12345.1"));
System.out.println(NumberUtils.isNumber("12345,2".replace(",", ".")));
System.out.println(NumberUtils.isNumber("12345,2"));
StringReader input = new StringReader(
"Правя тест на класификатор и после др.Дулитъл, пада.br2n ще се оправя с данните! които,са много зле. Но това е по-добре. Но24"
.replaceAll("br2n", ""));
LetterTokenizer tokenizer = new LetterTokenizer();
tokenizer.setReader(input);
TokenFilter stopFilter = new StopFilter(tokenizer, BULGARIAN_STOP_WORDS_SET);
TokenFilter length = new LengthFilter(stopFilter, 3, 1000);
TokenFilter stemmer = new BulgarianStemFilter(length);
TokenFilter ngrams = new ShingleFilter(stemmer, 2, 2);
try (TokenFilter filter = ngrams) {
Attribute termAtt = filter.addAttribute(CharTermAttribute.class);
filter.reset();
while (filter.incrementToken()) {
String word = termAtt.toString().replaceAll(",", "\\.").replaceAll("\n|\r", "");
System.out.println(word);
}
}
}
开发者ID:mhardalov,项目名称:news-credibility,代码行数:32,代码来源:EgdeMain.java
示例4: create
import org.apache.lucene.analysis.miscellaneous.LengthFilter; //导入依赖的package包/类
@Override
public TokenStream create(TokenStream tokenStream) {
return new LengthFilter(tokenStream, min, max);
}
开发者ID:justor,项目名称:elasticsearch_my,代码行数:5,代码来源:LengthTokenFilterFactory.java
示例5: wrapComponents
import org.apache.lucene.analysis.miscellaneous.LengthFilter; //导入依赖的package包/类
@Override
protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
TokenStream ts = components.getTokenStream();
LengthFilter drop_long_tokens = new LengthFilter(ts, 0, 1024);
return new TokenStreamComponents(components.getTokenizer(), drop_long_tokens);
}
开发者ID:isoboroff,项目名称:basekb-search,代码行数:7,代码来源:SafetyAnalyzer.java
示例6: create
import org.apache.lucene.analysis.miscellaneous.LengthFilter; //导入依赖的package包/类
@Override
public LengthFilter create(TokenStream input) {
return new LengthFilter(enablePositionIncrements, input,min,max);
}
开发者ID:pkarmstr,项目名称:NYBC,代码行数:5,代码来源:LengthFilterFactory.java
注:本文中的org.apache.lucene.analysis.miscellaneous.LengthFilter类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。 |
请发表评论