Details
Description
Hi,
It seems that when you use a character class regex in a find operation, it results in a full index scan even when the character class is anchored.
When the field is an array, it can result in a huge performance hit as each document is accessed multiple times for each indexed array element.
Note how we have 8 scanned objects in the following example:
db.foo.save({ "_id" : 1, "keywords" : [ "a" ] })
|
db.foo.save({ "_id" : 2, "keywords" : [ "b" ] })
|
db.foo.save({ "_id" : 3, "keywords" : [ "c" ] })
|
db.foo.save({ "_id" : 4, "keywords" : [ "a", "b" ] })
|
db.foo.save({ "_id" : 5, "keywords" : [ "a", "b", "c" ] })
|
db.foo.ensureIndex({ keywords:1 })
|
|
|
> db.foo.find({ keywords:/^[bc]/ }).explain()
|
{
|
"cursor" : "BtreeCursor keywords_1 multi",
|
"isMultiKey" : true,
|
"n" : 4,
|
"nscannedObjects" : 8,
|
"nscanned" : 8,
|
"nscannedObjectsAllPlans" : 8,
|
"nscannedAllPlans" : 8,
|
"scanAndOrder" : false,
|
"indexOnly" : false,
|
"nYields" : 0,
|
"nChunkSkips" : 0,
|
"millis" : 0,
|
"indexBounds" : {
|
"keywords" : [
|
[
|
"",
|
{
|
|
|
}
|
],
|
[
|
/^[bc]/,
|
/^[bc]/
|
]
|
]
|
},
|
"server" : "Jeffs-MacBook-Air.local:27017"
|
}
|
The workaround seems to be to specify each element of the character class individually:
> db.foo.find({ keywords:{ $in:[ /^b/, /^c/ ] }}).explain()
|
{
|
"cursor" : "BtreeCursor keywords_1 multi",
|
"isMultiKey" : true,
|
"n" : 4,
|
"nscannedObjects" : 5,
|
"nscanned" : 5,
|
"nscannedObjectsAllPlans" : 5,
|
"nscannedAllPlans" : 5,
|
"scanAndOrder" : false,
|
"indexOnly" : false,
|
"nYields" : 0,
|
"nChunkSkips" : 0,
|
"millis" : 0,
|
"indexBounds" : {
|
"keywords" : [
|
[
|
"b",
|
"d"
|
],
|
[
|
/^b/,
|
/^b/
|
],
|
[
|
/^c/,
|
/^c/
|
]
|
]
|
},
|
"server" : "Jeffs-MacBook-Air.local:27017"
|
}
|
Attachments
Issue Links
- is duplicated by
-
SERVER-22722 Ranged regex uses inefficient indexBounds
-
- Closed
-