[SERVER-14183] $in with limit() scans more than necessary Created: 06/Jun/14  Updated: 10/Dec/14  Resolved: 09/Jun/14

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.0
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Kevin Pulo Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

db.short.drop();
db.short.ensureIndex({a:1});
for(i=0; i < 100; i++) db.short.insert({a:i});
 
function check(exp, limit, _case) {
	//print(exp.nscanned);
	assert.eq(exp.n, limit, "n is " + exp.n + ", should be " + limit + " " + _case + " " + tojson(exp));
	assert.eq(exp.nscanned, limit, "nscanned is " + exp.nscanned + ", should be " + limit + " " + _case + " " + tojson(exp));
}
 
function check_limit(limit) {
	var query = {a: { $in: [ 4, 2, 6, 5, 3, 1, 10, 15, 22 ] } };
 
	check(db.short.find(query).limit(limit).explain(), limit, "(positive limit and no sort)");
	check(db.short.find(query).limit(-limit).explain(), limit, "(negative limit and no sort)");
	check(db.short.find(query).sort({a:1}).limit(limit).explain(), limit, "(positive limit and positive sort)");
	check(db.short.find(query).sort({a:1}).limit(-limit).explain(), limit, "(negative limit and positive sort)");
	check(db.short.find(query).sort({a:-1}).limit(limit).explain(), limit, "(positive limit and negative sort)");
	check(db.short.find(query).sort({a:-1}).limit(-limit).explain(), limit, "(negative limit and negative sort)");
}
 
check_limit(1);
//check_limit(2);
//check_limit(3);
//check_limit(4);

Participants:

 Description   

When using an $in clause which has more entries than specified by limit() (with an index), the server can stop scanning the index and return the results as soon as the desired number of results have been obtained. This is particularly useful when using $in with limit(1), ie. "just find me a document that matches one of these".

The limit(1) case works (ie. nscanned == limit) in version 2.4, but not 2.6. For limits > 1, both versions 2.4 and 2.6 scan slightly more than necessary (nscanned > limit), but 2.4 is a little better than 2.6.



 Comments   
Comment by David Storch [ 09/Jun/14 ]

The queries given in the repro steps above are solved as an IndexScan stage whose parent is a Fetch stage whose parent is a Limit stage. For those queries that specify a sort, there is no explicit Sort stage, as the sort order is obtained via the index scan.

The Limit stage will return an EOF indicator as soon as it has passed as many results as requested by the limit up to its caller. Therefore, the server does in fact stop scanning the index as soon as the desired number of results are returned.

The nscanned values reported for these examples reflect internal behavior of the index scan stage (i.e. how many keys the index scan stage had to examine). The nscanned value may be incremented for internal traversals between keys done in order to jump between index ranges. It is not necessarily equal to the number of matching keys. The order of magnitude of the nscanned value is much more important than precisely what the number is. You can see that the limit is being enforced correctly by doing something like this:

> t.drop()
> t.ensureIndex({a: 1})
> for (var i = 0; i < 10000; i++) { t.insert({a: i}); }
> for (var i = 2500; i < 7500; i++) { vals.push(i); }
> t.find({a: {$in: vals}}).limit(4).explain().nscanned
5

If we continued to scan the index beyond what is required by the limit, we would expect nscanned to be closer to the number of matching documents (approximately 5000). Instead it is only 5, just one greater than the limit value.

I hope that was helpful, and please let me know if there are further questions.

Generated at Thu Feb 08 03:34:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.