In general, there is a tradeoff between "precision" and "recall". High precision means that fewer irrelevant results are presented (no false positives), while high recall means that fewer relevant results are missing (no false negatives). Using the LIKE operator gives you 100% precision with no concessions for recall. A full text search facility gives you a lot of flexibility to tune down the precision for better recall.
Most full text search implementations use an "inverted index". This is an index where the keys are individual terms, and the associated values are sets of records that contain the term. Full text search is optimized to compute the intersection, union, etc. of these record sets, and usually provides a ranking algorithm to quantify how strongly a given record matches search keywords.
The SQL LIKE operator can be extremely inefficient. If you apply it to an un-indexed column, a full scan will be used to find matches (just like any query on an un-indexed field). If the column is indexed, matching can be performed against index keys, but with far less efficiency than most index lookups. In the worst case, the LIKE pattern will have leading wildcards that require every index key to be examined. In contrast, many information retrieval systems can enable support for leading wildcards by pre-compiling suffix trees in selected fields.
Other features typical of full-text search are
- lexical analysis or tokenization—breaking a
block of unstructured text into
individual words, phrases, and
special tokens
- morphological
analysis, or stemming—collapsing variations
of a given word into one index term;
for example, treating "mice" and
"mouse", or "electrification" and
"electric" as the same word
- ranking—measuring the
similarity of a matching record to
the query string
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…