-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.2.2
-
Component/s: Index Maintenance
-
Environment:Ubuntu 11.04
-
ALL
After upgrading to 2.2.2 (and php driver 1.3.1) we've started noticing some odd timeouts. We use a multikey index on a set of "digested" words as a pseudo-fulltext search, and then query with $all : [ ... ], which has worked very well.
After the upgrade, the query optimizer doesn't seem to want to use those indexes in many cases when combined with sort, which results in awful nscanned:
data:PRIMARY> db.a_ftx.find( { ftx: { $all: [ "the", "xx" ] } } ).sort( { last_seen: -1 } ).limit(500).explain(); { "cursor" : "BtreeCursor last_seen_-1", "isMultiKey" : false, "n" : 382, "nscannedObjects" : 596759, "nscanned" : 596759, "nscannedObjectsAllPlans" : 716532, "nscannedAllPlans" : 716532, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 131, "nChunkSkips" : 0, "millis" : 3710, "indexBounds" : { "last_seen" : [ [ { "$maxElement" : 1 }, { "$minElement" : 1 } ] ] }, "server" : "xx:27017" }
If we provide a hint, performance is better, but still not great because only the first term inside $all appears to be used for the index. I'm not sure if the previous behavior would have been to use both (i.e. find both sets then compute intersection), but performance used to be much better.
data:PRIMARY> db.a_ftx.find( { ftx: { $all: [ "the", "xx" ] } } ).hint({ ftx : 1 }).sort( { last_seen: -1 } ).limit(500).explain(); { "cursor" : "BtreeCursor ftx_1", "isMultiKey" : true, "n" : 382, "nscannedObjects" : 131956, "nscanned" : 131956, "nscannedObjectsAllPlans" : 131956, "nscannedAllPlans" : 131956, "scanAndOrder" : true, "indexOnly" : false, "nYields" : 174, "nChunkSkips" : 0, "millis" : 5901, "indexBounds" : { "ftx" : [ [ "the", "the" ] ] }, "server" : "xx:27017" }
If we manually reorder the terms inside $all, the result is pretty good, similar to past performance:
data:PRIMARY> db.a_ftx.find( { ftx: { $all: [ "xx", "the" ] } } ).hint({ ftx : 1 }).sort( { last_seen: -1 } ).limit(500).explain(); { "cursor" : "BtreeCursor ftx_1", "isMultiKey" : true, "n" : 382, "nscannedObjects" : 511, "nscanned" : 511, "nscannedObjectsAllPlans" : 511, "nscannedAllPlans" : 511, "scanAndOrder" : true, "indexOnly" : false, "nYields" : 3, "nChunkSkips" : 0, "millis" : 20, "indexBounds" : { "ftx" : [ [ "xx", "xx" ] ] }, "server" : "xx:27017" }
data:PRIMARY> db.a_ftx.find( { ftx: { $all: [ "the", "xx" ] } } ).hint({ ftx : 1 }).limit(500).explain(); { "cursor" : "BtreeCursor ftx_1", "isMultiKey" : true, "n" : 382,
Finally, the behavior of the optimizer seems unpredictable, for example some (absolutely catastrophic) slow queries logged like this:
Wed Dec 12 14:30:21 [conn50070] query ftx.a_ftx query: { $query: { ftx: { $all: [ "hamilton", "jo" ] } }, $orderby: { last_seen: -1 } } ntoreturn:500 ntoskip:0 nscanned:596752 keyUpdates:0 numYields: 724 locks(micros) r:39541636 nreturned:3 reslen:143 20917ms
behave reasonably if reproduced exactly via the mongo shell:
data:PRIMARY> db.a_ftx.find( { ftx: { $all: [ "hamilton", "jo" ] } } ).sort( { last_seen: -1 } ).limit(500).explain(); { "cursor" : "BtreeCursor ftx_1", "isMultiKey" : true, "n" : 3, "nscannedObjects" : 148, "nscanned" : 148, "nscannedObjectsAllPlans" : 444, "nscannedAllPlans" : 444, "scanAndOrder" : true, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 1, "indexBounds" : { "ftx" : [ [ "hamilton", "hamilton" ] ] }, "server" : "xx:27017" }