bottom line / tl;dr: Index b
can be 'skipped' if a
and c
are queried for equality or inequality, but not, for instance, for sorts on c
.
This is a very good question. Unfortunately, I couldn't find anything that authoritatively answers this in greater detail. I believe the performance of such queries has improved over the last years, so I wouldn't trust old material on the topic.
The whole thing is quite complicated because it depends on the selectivity on your indexes and whether you query for equality, inequality and/or sort, so explain()
is your only friend, but here are some things I found:
Caveat: What comes now is a mixture of experimental results, reasoning and guessing. I might be stretching Kyle's analogy too far, and I might even be completely wrong (and unlucky, because my test results loosely match my reasoning).
It is clear that the index of A can be used, which, depending on the selectivity of A, is certainly very helpful. 'Skipping' B can be tricky, or not. Let's keep this similar to Kyle's cookbook example:
French
Beef
...
Chicken
Coq au Vin
Roasted Chicken
Lamb
...
...
If you now ask me to find some French dish called "Chateaubriand", I can use index A
and, because I don't know the ingredient, will have to scan all dishes in A
. On the other hand, I do know that the list of dishes in each category is sorted through the index C
, so I will only have to look for the strings starting with, say, "Cha" in each ingredient-list. If there are 50 ingredients, I will need 50 lookups instead of just one, but that is a lot better than having to scan every French dish!
In my experiments, the number was a lot smaller than the number of distinct values in b
: it never seemd to exceed 2. However, I tested this only with a single collection, and it probably has to do with the selectivity of the b
-index.
If you asked me to give you an alphabetically sorted list of all French dishes, though, I'd be in trouble. Now the index on C
is worthless, I'd have to merge-sort all those index lists. I will have to scan every element to do so.
This reflects in my tests. Here are some simplified results. The original collection has datetimes, ints and strings, but I wanted to keep things simple, so it's now all ints.
Essentially, there are only two classes of queries: those where nscanned
<= 2 * limit
, and those that have to scan the entire collection (120k documents). The index is {a, b, c}
:
// fast (range query on c while skipping b)
> db.Test.find({"a" : 43, "c" : { $lte : 45454 }});
// slow (sorting)
> db.Test.find({"a" : 43, "c" : { $lte : 45454 }}).sort({ "c" : -1});
> db.Test.find({"a" : 43, "c" : { $lte : 45454 }}).sort({ "b" : -1});
// fast (can sort on c if b included in the query)
> db.Test.find({"a" : 43, "b" : 7887, "c" : { $lte : 45454 }}).sort({ "c" : -1});
// fast (older tutorials claim this is slow)
> db.Test.find({"a" : {$gte : 43}, "c" : { $lte : 45454 }});
Your mileage will vary.