[SERVER-3122] Range queries on compound indexes involve unnecessary scanning when sorting and limit are involved Created: 19/May/11  Updated: 28/Jan/15  Resolved: 04/Sep/11

Status: Closed
Project: Core Server
Component/s: Performance, Querying
Affects Version/s: 1.8.1, 1.9.0
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Alexey Borzenkov Assignee: Aaron Staple
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongo-range-query.js    
Issue Links:
Duplicate
duplicates SERVER-3758 push limit down into index scan for s... Backlog
Participants:

 Description   

If there's a compound index, you query and sort on the last key, then MongoDB doesn't take limit into account and does a lot of unnecessary scanning.

For example, if I have an index on {i:1, j:1}, I query with a range on i and sort on j with limit 10, then explain and nscanned shows that a whole range from [i1, jmin] to [i2, jmax] was scanned, even for for each i only the first 10 could be scanned and the rest could be skipped. Attached is an example script that generates the following output:

MongoDB shell version: 1.9.0
connecting to: test
{
	"cursor" : "BtreeCursor i_1_j_1",
	"nscanned" : 3000,
	"nscannedObjects" : 3000,
	"n" : 10,
	"scanAndOrder" : true,
	"millis" : 6,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"isMultiKey" : false,
	"indexOnly" : false,
	"indexBounds" : {
		"i" : [
			[
				2,
				4
			]
		],
		"j" : [
			[
				{
					"$minElement" : 1
				},
				{
					"$maxElement" : 1
				}
			]
		]
	}
}

If BtreeCursor could know that it is a sorting query on a field with a particular limit, then it is maybe possible to optimize this case by advancing to the next group of keys if limit number of objects have been scanned with the same "prefix" of fields that are before that field (I hope I say it correctly).

I real life scenario I have a collection of "events" (which stores that an event of particular "type" happened, where "type" is a string) and use an index on {type:1, _id:1}. On the web-interface I need to show e.g. last 10 events of some types. When querying with a specific type, it works perfectly. But when trying to show last e.g. 10 events of several types it basically starts using BasicCursor, because it turns out to be faster.

In some places I currently workaround it by doing several queries on each type in parallel and merging them at client side. It works when exact types are known in advance, but doesn't when they are not, e.g. when trying to query using a regex like /^prefix/ (and it's hard to know which types there are, because distinct has a similar problem with excessive scanning).

Could you please improve this, or at least point me how exactly MongoDB constructs cursors for sorting and where sorting is happening, so I could tinker with it myself?



 Comments   
Comment by Aaron Staple [ 06/Jun/11 ]

Hi Alexey,

You're right, in this case we could potentially scan fewer documents. If you want to tinker, you can look at FieldRangeVectorIterator::advance and have it skip to the next value of the first index key when we hit a specified count of values of the second index key. Right now it only skips to the next value of the first index key when the value for the second index key goes out of range.

Generated at Thu Feb 08 03:02:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.