[SERVER-14228] Setting batchSize and sort on a cursor in sharded collection causes fewer than all documents to be returned Created: 11/Jun/14  Updated: 10/Dec/14  Resolved: 12/Jun/14

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.4.8, 2.4.10
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Jason Liszka Assignee: David Storch
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-12438 batch size with an unindexed sort in ... Closed
Related
is related to SERVER-6015 using sort() with batchSize() never r... Closed
is related to SERVER-7267 Batch size + sort closes the cursor Closed
is related to SERVER-14174 If ntoreturn is a limit (rather than ... Closed
is related to SERVER-5374 batchSize is a hard limit for an in m... Closed
Operating System: ALL
Steps To Reproduce:

This reproduces on a single instance:

> db.foo.drop()
> for (i=0; i<1000; i++) {db.foo.insert({x:i})}
> db.foo.find().batchSize(100).itcount()
1000
> db.foo.find().sort({x:1}).batchSize(100).itcount()
100

Participants:

 Description   

Here is the correct number of documents:

mongos> db.ad_campaigns.find({fs:2}).count()
4587

If you specify a batchSize and a sort, you get fewer than all the documents:

mongos> function cursorCount(query, batchSize) { var count = 0; var cursor = db.ad_campaigns.find(query).batchSize(batchSize); while(cursor.hasNext()) { cursor.next(); count++; }; return count }
mongos> cursorCount({fs:2}, 100)
3504

The number of documents returned appears to be K + batchSize, for some K:

mongos> cursorCount({fs:2}, 101)
3505
mongos> cursorCount({fs:2}, 102)
3506

K (3404 in this example) happens to match the number of documents matching the query on one shard (ads1):

ads1> db.ad_campaigns.find({fs:2}).count()
3404

This relationship between K and the number of records on ads1 is borne out on other queries:

mongos> db.ad_campaigns.find({fs:5}).count()
9764
mongos> cursorCount({fs:5}, 100)
5249
ads1> db.ad_campaigns.find({fs:5}).count()
5149

If the batch size is not specified or the cursor is not sorted, the problem goes away:

mongos> db.ad_campaigns.find({fs:2}).count()
4587
mongos> cursorCount({fs:5}, 2000)
4587
mongos> var count = 0; var cursor = db.ad_campaigns.find({fs:2}).batchSize(100); while(cursor.hasNext()) { cursor.next(); count++; }; count
4587



 Comments   
Comment by David Storch [ 12/Jun/14 ]

jliszka, glad I could help!

Comment by Jason Liszka [ 12/Jun/14 ]

Hinting {_id: 1} fixed the problem. I don't think we need a backport to 2.4.x. We can work around it by not setting batchSize on the cursor.

mongos> db.ad_campaigns.find({fs:2}).sort({_id:1}).itcount()
4626
mongos> db.ad_campaigns.find({fs:2}).sort({_id:1}).batchSize(100).itcount()
3544
mongos> db.ad_campaigns.find({fs:2}).sort({_id:1}).batchSize(100).hint({_id:1}).itcount()
4626

Thanks!

Comment by David Storch [ 12/Jun/14 ]

Hi jliszka,

Thanks for your prompt response, and for providing the explain output. From the output you provided, I can see that ads0 shard is using index {fs: 1, nt: 1} whereas ads1 and ads2 are using the _id index. Due to the bug fixed in SERVER-12438, ads0 will only return a single batch (it's doing a topK for batchSize + unindexed sort). On the other hand, ads1 is getting the sort order from an index scan, and therefore is returning all batches as expected. Based on this additional information, I do believe that this is a duplicate of SERVER-12438.

However, it does appear to be fixed in 2.6.0. Do you think the root cause is the same?

Yes, the root cause is the same. Glad to hear that you have verified the fix in 2.6.0.

What to do you suggest as a workaround?

If possible, hinting {_id: 1} could be a workaround. This will force the shards to obtain the sort via an index scan, and prevent missing batches due to erroneous topK sorts on any of the shards. Note that this will cause the entire _id index to be scanned on each shard, so this workaround should only be used if the performance impact of a full index scan is acceptable.

Is there any chance of a backport of the fix to 2.4.x?

The fix in 2.6 was part of a much larger rewrite of the query engine. Therefore, it cannot be trivially backported to 2.4. A fix in 2.4 would require an entirely new patch (and may be difficult to engineer due to some larger abstraction and design issues that were resolved in 2.6). Is a fix in 2.4 critical for your application?

Best,
Dave

Comment by Jason Liszka [ 12/Jun/14 ]

Hi Dave,

Thanks for the explanation. I believe the behavior described in https://jira.mongodb.org/browse/SERVER-12438 differs from what we're seeing in a couple of ways:

1. The query is using an index (both fs and _id are indexed in the example I provided, see explain() output below), and
2. We're getting more than one batch of results. In my example, I get 3,504 documents with a batch size of 100. It appears I am getting all the results from one shard plus one batch.

However, it does appear to be fixed in 2.6.0. Do you think the root cause is the same? What to do you suggest as a workaround? Is there any chance of a backport of the fix to 2.4.x?

Here's the explain() output. Worth noting that we get all the results from the ads1 shard (3434 of them at this point in time), plus one batch.

mongos> db.ad_campaigns.find({fs:2}).sort({_id:1}).explain()
{
	"clusteredType" : "ParallelSort",
	"shards" : {
		"ads0/fsaa19:27400,fsad15:27400,fsai10:27400,fsap1:27400" : [
			{
				"cursor" : "BtreeCursor fs_1_nt_1",
				"isMultiKey" : false,
				"n" : 1182,
				"nscannedObjects" : 1182,
				"nscanned" : 1182,
				"nscannedObjectsAllPlans" : 3547,
				"nscannedAllPlans" : 3547,
				"scanAndOrder" : true,
				"indexOnly" : false,
				"nYields" : 3,
				"nChunkSkips" : 0,
				"millis" : 31,
				"indexBounds" : {
					"fs" : [
						[
							2,
							2
						]
					],
					"nt" : [
						[
							{
								"$minElement" : 1
							},
							{
								"$maxElement" : 1
							}
						]
					]
				},
				"server" : "fsap1.prod.foursquare.com:27400"
			}
		],
		"ads1/fsac29:27401,fsad12:27401,fsai19:27401" : [
			{
				"cursor" : "BtreeCursor _id_",
				"isMultiKey" : false,
				"n" : 3434,
				"nscannedObjects" : 11162,
				"nscanned" : 11162,
				"nscannedObjectsAllPlans" : 12316,
				"nscannedAllPlans" : 12316,
				"scanAndOrder" : false,
				"indexOnly" : false,
				"nYields" : 96,
				"nChunkSkips" : 0,
				"millis" : 116,
				"indexBounds" : {
					"_id" : [
						[
							{
								"$minElement" : 1
							},
							{
								"$maxElement" : 1
							}
						]
					]
				},
				"server" : "fsad12:27401"
			}
		],
		"ads2/fsaa29:27402,fsab8:27402,fsai26:27402" : [
			{
				"cursor" : "BtreeCursor _id_",
				"isMultiKey" : false,
				"n" : 0,
				"nscannedObjects" : 0,
				"nscanned" : 0,
				"nscannedObjectsAllPlans" : 0,
				"nscannedAllPlans" : 0,
				"scanAndOrder" : false,
				"indexOnly" : false,
				"nYields" : 0,
				"nChunkSkips" : 0,
				"millis" : 3,
				"indexBounds" : {
					"_id" : [
						[
							{
								"$minElement" : 1
							},
							{
								"$maxElement" : 1
							}
						]
					]
				},
				"server" : "fsab8:27402"
			}
		]
	},
	"cursor" : "multiple",
	"n" : 4616,
	"nChunkSkips" : 0,
	"nYields" : 99,
	"nscanned" : 12344,
	"nscannedAllPlans" : 15863,
	"nscannedObjects" : 12344,
	"nscannedObjectsAllPlans" : 15863,
	"millisShardTotal" : 150,
	"millisShardAvg" : 50,
	"numQueries" : 3,
	"numShards" : 3,
	"millis" : 561
}

Comment by David Storch [ 12/Jun/14 ]

Hi jliszka,

Thanks for reporting the issue. This appears to be a known bug in 2.4.x versions which has been fixed in 2.6.0 under SERVER-12438 (if you are interested, the fix is in this commit). Read on for some of the details of the resolution of SERVER-12438.

The MongoDB wire protocol specifies that batchSize and limit values set on the client side are both passed to the server in a single "ntoreturn" field. This is a limitation of the wire protocol. The consequence is that the server does not know whether to interpret ntoreturn as a batchSize or as a limit.

Usually, the server logic is the same for both meanings. However, batchSize with sort should be handled differently than limit with sort when an in-memory sort is required. In the limit with sort case, we know that we only ever need to return "notoreturn" results, so the server performs a topK sort. This is more efficient than a full sort, and requires less memory. However, if ntoreturn means batchSize, then a full sort is required. 2.4.x versions erroneously assumed that the ntoreturn was a limit: it did a topK and then closed the cursor without returning further results, which is the bug reported here. In 2.6, the server does a topK for the first batch but leaves the cursor open. If further batches are requested by the client, then it switches over to a full in-memory sort in order to return the subsequent batches.

I hope this explanation was helpful, and please feel free to reach out with further questions. To address your point above:

The single-instance repro in "Steps to Reproduce" is https://jira.mongodb.org/browse/SERVER-7267, which is not what we're seeing here. That bug is about only getting one batch when you sort on a non-indexed field. If you add an index on x in your repro steps, the problem should go away.

In this bug, even if you sort on an indexed field (in my example we're sorting on _id), you get one shard's worth plus one batch of documents.

You are correct that this problem only applies when the plan selected involves an unindexed sort. I suspect that in your case, the server is choosing to use an index on the "fs" field rather than scanning the _id index in order to obtain the desired sort order. Can you confirm by posting the explain output for the problematic query?

Best,
Dave

Comment by Jason Liszka [ 11/Jun/14 ]

The single-instance repro in "Steps to Reproduce" is https://jira.mongodb.org/browse/SERVER-7267, which is not what we're seeing here. That bug is about only getting one batch when you sort on a non-indexed field. If you add an index on x in your repro steps, the problem should go away.

In this bug, even if you sort on an indexed field (in my example we're sorting on _id), you get one shard's worth plus one batch of documents.

Comment by Ramon Fernandez Marina [ 11/Jun/14 ]

Thanks for reporting this jliszka, we can reproduce this behavior and we're investigating.

Comment by Jason Liszka [ 11/Jun/14 ]

Mis-pasted the cursorCount function, should be:

function cursorCount(query, batchSize) {
  var count = 0;
  var cursor = db.ad_campaigns.find(query).batchSize(batchSize).sort({_id:1});
  while(cursor.hasNext()) { cursor.next(); count++; };
  return count;
}

Generated at Thu Feb 08 03:34:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.