In the Lucene query syntax I'd like to combine * and ~ in a valid query similar to:
bla~* //invalid query
Meaning: Please match words that begin with "bla" or something similar to "bla".
Update:
What I do now, works for small input, is use the following (snippet of SOLR schema):
<fieldtype name="text_ngrams" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
In case you don't use SOLR, this does the following.
Indextime: Index data by creating a field containing all prefixes of my (short) input.
Searchtime: only use the ~ operator, as prefixes are explicitly present in the index.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…