The main case here is that a "text" search result is generally taking precedence over other filter conditions in the query, and as such it becomes necessary to "first" obtain results from the "text" component, and then basically "scan" for other conditions in the document.
This type of search can be difficult to optimise along with a "range" or any type of "inequality" match condition in conjuntion with the text search results, and is mostly due to how MongoDB handles this "special" index type.
For a short demonstration, consider the following basic setup:
db.texty.drop();
db.texty.insert([
{ "a": "a", "text": "something" },
{ "a": "b", "text": "something" },
{ "a": "b", "text": "nothing much" },
{ "a": "c", "text": "something" }
])
db.texty.createIndex({ "text": "text" })
db.texty.createIndex({ "a": 1 })
So if you wanted to look at this with a text search condition as well as a range consideration on the other field ( { "$lt": "c" }
), then you could handle as follows:
db.texty.find({ "a": { "$lt": "c" }, "$text": { "$search": "something" } }).explain()
With the explain output such as ( important part ):
"winningPlan" : {
"stage" : "FETCH",
"filter" : {
"a" : {
"$lt" : "c"
}
},
"inputStage" : {
"stage" : "TEXT",
"indexPrefix" : {
},
"indexName" : "text_text",
"parsedTextQuery" : {
"terms" : [
"someth"
],
"negatedTerms" : [ ],
"phrases" : [ ],
"negatedPhrases" : [ ]
},
"inputStage" : {
"stage" : "TEXT_MATCH",
"inputStage" : {
"stage" : "TEXT_OR",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_fts" : "text",
"_ftsx" : 1
},
"indexName" : "text_text",
"isMultiKey" : true,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "backward",
"indexBounds" : {
}
}
}
}
}
},
Which is basically saying "first get me the text results and then filter those results fetched by the other condition". So clearly only the "text" index is being used here and then all the results it returns are subsequently being filtered by examining the content.
This is not optimal for two reasons, being that it may likely be that the data is best constrained by the "range" condition rather than the matches from the text search. Secondly, even though there is an index on the other data, it is not being used here for comparison. So rather the whole document is loaded for each result and the filter is tested.
You might then consider a "compound" index format here, and it would seem initially logical that if the "range" is more specific to selection, then include that as the prefixed order of the indexed keys:
db.texty.dropIndexes();
db.texty.createIndex({ "a": 1, "text": "text" })
But there is a catch here, since when you attempt to run the query again:
db.texty.find({ "a": { "$lt": "c" }, "$text": { "$search": "something" } })
It would result in an error:
Error: error: {
"waitedMS" : NumberLong(0),
"ok" : 0,
"errmsg" : "error processing query: ns=test.textyTree: $and
a $lt "c"
TEXT : query=something, language=english, caseSensitive=0, diacriticSensitive=0, tag=NULL
Sort: {}
Proj: {}
planner returned error: failed to use text index to satisfy $text query (if text index is compound, are equality predicates given for all prefix fields?)",
"code" : 2
}
So even though that may seem "optimal", the way MongoDB processes the query ( and really index selection ) for the special "text" index, it is just not possible for this "exclusion" outside of the range to be possible.
You can however perform an "equality" match on this in a very efficient way:
db.texty.find({ "a": "b", "$text": { "$search": "something" } }).explain()
With the explain output:
"winningPlan" : {
"stage" : "TEXT",
"indexPrefix" : {
"a" : "b"
},
"indexName" : "a_1_text_text",
"parsedTextQuery" : {
"terms" : [
"someth"
],
"negatedTerms" : [ ],
"phrases" : [ ],
"negatedPhrases" : [ ]
},
"inputStage" : {
"stage" : "TEXT_MATCH",
"inputStage" : {
"stage" : "TEXT_OR",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"a" : 1,
"_fts" : "text",
"_ftsx" : 1
},
"indexName" : "a_1_text_text",
"isMultiKey" : true,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "backward",
"indexBounds" : {
}
}
}
}
},
So the index is used and it can be shown to "pre-filter" the content provided to the text matching by the output of the other condition.
If indeed you keep the "prefix" to the index as the "text" field(s) to search however:
db.texty.dropIndexes();
db.texty.createIndex({ "text": "text", "a": 1 })
Then perform the search:
db.texty.find({ "a": { "$lt": "c" }, "$text": { "$search": "something" } }).explain()
Then you see a similar result to the above "equality" match:
"winningPlan" : {
"stage" : "TEXT",
"indexPrefix" : {
},
"indexName" : "text_text_a_1",
"parsedTextQuery" : {
"terms" : [
"someth"
],
"negatedTerms" : [ ],
"phrases" : [ ],
"negatedPhrases" : [ ]
},
"inputStage" : {
"stage" : "TEXT_MATCH",
"inputStage" : {
"stage" : "TEXT_OR",
"filter" : {
"a" : {
"$lt" : "c"
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_fts" : "text",
"_ftsx" : 1,
"a" : 1
},
"indexName" : "text_text_a_1",
"isMultiKey" : true,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "backward",
"indexBounds" : {
}
}
}
}
},
The big differnce here from the first attempt being where filter
is placed in the processing chain, indicating that whilst not a "prefix" match ( which is most optimal ), the content is indeed being scanned off of the index "before" being sent to the "text" stage.
So it is "pre-filtered" but not of course in the most optimal way, and this is due to the very nature of how the "text" index is used. So if you just considered the plain range on an index by itself:
db.texty.createIndex({ "a": 1 })
db.texty.find({ "a": { "$lt": "c" } }).explain()
Then the explain output:
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"a" : 1
},
"indexName" : "a_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" :