Hi,
It seems that when you use a character class regex in a find operation, it results in a full index scan even when the character class is anchored.
When the field is an array, it can result in a huge performance hit as each document is accessed multiple times for each indexed array element.
Note how we have 8 scanned objects in the following example:
db.foo.save({ "_id" : 1, "keywords" : [ "a" ] }) db.foo.save({ "_id" : 2, "keywords" : [ "b" ] }) db.foo.save({ "_id" : 3, "keywords" : [ "c" ] }) db.foo.save({ "_id" : 4, "keywords" : [ "a", "b" ] }) db.foo.save({ "_id" : 5, "keywords" : [ "a", "b", "c" ] }) db.foo.ensureIndex({ keywords:1 }) > db.foo.find({ keywords:/^[bc]/ }).explain() { "cursor" : "BtreeCursor keywords_1 multi", "isMultiKey" : true, "n" : 4, "nscannedObjects" : 8, "nscanned" : 8, "nscannedObjectsAllPlans" : 8, "nscannedAllPlans" : 8, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 0, "indexBounds" : { "keywords" : [ [ "", { } ], [ /^[bc]/, /^[bc]/ ] ] }, "server" : "Jeffs-MacBook-Air.local:27017" }
The workaround seems to be to specify each element of the character class individually:
> db.foo.find({ keywords:{ $in:[ /^b/, /^c/ ] }}).explain() { "cursor" : "BtreeCursor keywords_1 multi", "isMultiKey" : true, "n" : 4, "nscannedObjects" : 5, "nscanned" : 5, "nscannedObjectsAllPlans" : 5, "nscannedAllPlans" : 5, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 0, "indexBounds" : { "keywords" : [ [ "b", "d" ], [ /^b/, /^b/ ], [ /^c/, /^c/ ] ] }, "server" : "Jeffs-MacBook-Air.local:27017" }
- is duplicated by
-
SERVER-22722 Ranged regex uses inefficient indexBounds
- Closed