Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Minor - P4
Fix Version/s: None
Affects Version/s: 2.4.4
Component/s: Querying
Labels:
- storch

Assigned Teams:

Query Optimization
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Hi,

It seems that when you use a character class regex in a find operation, it results in a full index scan even when the character class is anchored.

When the field is an array, it can result in a huge performance hit as each document is accessed multiple times for each indexed array element.

Note how we have 8 scanned objects in the following example:

db.foo.save({ "_id" : 1, "keywords" : [  "a" ] })
db.foo.save({ "_id" : 2, "keywords" : [  "b" ] })
db.foo.save({ "_id" : 3, "keywords" : [  "c" ] })
db.foo.save({ "_id" : 4, "keywords" : [  "a",  "b" ] })
db.foo.save({ "_id" : 5, "keywords" : [  "a",  "b",  "c" ] })
db.foo.ensureIndex({ keywords:1 })

> db.foo.find({ keywords:/^[bc]/ }).explain()
{
	"cursor" : "BtreeCursor keywords_1 multi",
	"isMultiKey" : true,
	"n" : 4,
	"nscannedObjects" : 8,
	"nscanned" : 8,
	"nscannedObjectsAllPlans" : 8,
	"nscannedAllPlans" : 8,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 0,
	"indexBounds" : {
		"keywords" : [
			[
				"",
				{

				}
			],
			[
				/^[bc]/,
				/^[bc]/
			]
		]
	},
	"server" : "Jeffs-MacBook-Air.local:27017"
}

The workaround seems to be to specify each element of the character class individually:

> db.foo.find({ keywords:{ $in:[ /^b/, /^c/ ] }}).explain()
{
	"cursor" : "BtreeCursor keywords_1 multi",
	"isMultiKey" : true,
	"n" : 4,
	"nscannedObjects" : 5,
	"nscanned" : 5,
	"nscannedObjectsAllPlans" : 5,
	"nscannedAllPlans" : 5,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 0,
	"indexBounds" : {
		"keywords" : [
			[
				"b",
				"d"
			],
			[
				/^b/,
				/^b/
			],
			[
				/^c/,
				/^c/
			]
		]
	},
	"server" : "Jeffs-MacBook-Air.local:27017"
}

is duplicated by

SERVER-22722 Ranged regex uses inefficient indexBounds

Closed

Assignee:: [DO NOT USE] Backlog - Query Optimization
Reporter:: Jeff lee
Participants:: [DO NOT USE] Backlog - Query Optimization, Guy Arad, Jeff lee
Votes:: 3 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: Jun 14 2013 07:13:02 PM UTC
Updated:: Dec 06 2022 05:20:13 AM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates