Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
695 views
in Technique[技术] by (71.8m points)

java - Using Lucene Analyzer Without Indexing - Is My Approach Reasonable?

My objective is to leverage some of Lucene's many tokenizers and filters to transform input text, but without the creation of any indexes.

For example, given this (contrived) input string...

" Someone’s - [texté] goes here, foo . "

...and a Lucene analyzer like this...

Analyzer analyzer = CustomAnalyzer.builder()
        .withTokenizer("icu")
        .addTokenFilter("lowercase")
        .addTokenFilter("icuFolding")
        .build();

I want to get the following output:

someone's texte goes here foo

The below Java method does what I want.

But is there a better (i.e. more typical and/or concise) way that I should be doing this?

I am specifically thinking about the way I have used TokenStream and CharTermAttribute, since I have never used them like this before. Feels clunky.

Here is the code:

Lucene 8.3.0 imports:

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.custom.CustomAnalyzer;

My method:

private String transform(String input) throws IOException {

    Analyzer analyzer = CustomAnalyzer.builder()
            .withTokenizer("icu")
            .addTokenFilter("lowercase")
            .addTokenFilter("icuFolding")
            .build();

    TokenStream ts = analyzer.tokenStream("myField", new StringReader(input));
    CharTermAttribute charTermAtt = ts.addAttribute(CharTermAttribute.class);

    StringBuilder sb = new StringBuilder();
    try {
        ts.reset();
        while (ts.incrementToken()) {
            sb.append(charTermAtt.toString()).append(" ");
        }
        ts.end();
    } finally {
        ts.close();
    }
    return sb.toString().trim();
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I have been using this set-up for a few weeks without issue. I have not found a more concise approach. I think the code in the question is OK.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...