ElasticSearch 5.x introduced some (breaking) changes to the Suggester API (Documentation). Most notable change is the following:
Completion suggester is document-oriented
Suggestions are aware of the
document they belong to. Now, associated documents (_source
) are
returned as part of completion suggestions.
In short, all completion queries return all matching documents instead of just matched words. And herein lies the problem - duplication of autocompleted words if they occur in more than one document.
Let's say we have this simple mapping:
{
"my-index": {
"mappings": {
"users": {
"properties": {
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
},
"suggest": {
"type": "completion",
"analyzer": "simple"
}
}
}
}
}
}
With a few test documents:
{
"_index": "my-index",
"_type": "users",
"_id": "1",
"_source": {
"firstName": "John",
"lastName": "Doe",
"suggest": [
{
"input": [
"John",
"Doe"
]
}
]
}
},
{
"_index": "my-index",
"_type": "users",
"_id": "2",
"_source": {
"firstName": "John",
"lastName": "Smith",
"suggest": [
{
"input": [
"John",
"Smith"
]
}
]
}
}
And a by-the-book query:
POST /my-index/_suggest?pretty
{
"my-suggest" : {
"text" : "joh",
"completion" : {
"field" : "suggest"
}
}
}
Which yields the following results:
{
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"my-suggest": [
{
"text": "joh",
"offset": 0,
"length": 3,
"options": [
{
"text": "John",
"_index": "my-index",
"_type": "users",
"_id": "1",
"_score": 1,
"_source": {
"firstName": "John",
"lastName": "Doe",
"suggest": [
{
"input": [
"John",
"Doe"
]
}
]
}
},
{
"text": "John",
"_index": "my-index",
"_type": "users",
"_id": "2",
"_score": 1,
"_source": {
"firstName": "John",
"lastName": "Smith",
"suggest": [
{
"input": [
"John",
"Smith"
]
}
]
}
}
]
}
]
}
In short, for a completion suggest for text "joh", two (2) documents were returned - both John's and both had the same value of the text
property.
However, I would like to receive one (1) word. Something simple like this:
{
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"my-suggest": [
{
"text": "joh",
"offset": 0,
"length": 3,
"options": [
"John"
]
}
]
}
Question: how to implement a word-based completion suggester. There is no need to return any document related data, since I don't need it at this point.
Is the "Completion Suggester" even appropriate for my scenario? Or should I use a completely different approach?
EDIT:
As many of you pointed out, an additional completion-only index would be a viable solution. However, I can see multiple issues with this approach:
- Keeping the new index in sync.
- Auto-completing subsequent words would probably be global, instead of narrowed down. For example, say you have the following words in the additional index:
"John", "Doe", "David", "Smith"
. When querying for "John D"
, the result for the incomplete word should be "Doe"
and not "Doe", "David"
.
To overcome the second point, only indexing single words wouldn't be enough, since you would also need to map all words to documents in order to properly narrow down auto-completing subsequent words. And with this, you actually have the same problem as querying the original index. Therefore, the additional index doesn't make sense anymore.
See Question&Answers more detail:
os