[SERVER-14299] For sharded limit=N queries with sort, mongos can request >N results from shard Created: 18/Jun/14  Updated: 11/Jul/16  Resolved: 07/Jul/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 2.6.6, 2.7.4

Type: Bug Priority: Major - P3
Reporter: J Rassi Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File getmore_runner_error.js    
Issue Links:
Depends
Related
related to SERVER-7676 find usings sort,skip,limit where sor... Closed
related to SERVER-14306 mongos can cause the in-memory sort l... Closed
Operating System: ALL
Backport Completed:
Participants:

 Description   

If a user issues a limit=N query that includes a sort against a sharded collection and all of the first N results come back from one shard, mongos will request an extra batch of results from that shard.

This bug has a particularly harmful interaction with the "top-K sort" logic introduced in version 2.6.0 of the server. If mongos requests >N results from a shard for a query that has a limit of N and uses an unindexed sort, then the shard will perform a full "unlimited" sort (and return "getMore runner error: Overflow sort stage buffered data usage..." to the user if the size of the data to be sorted exceeds the 32MB sort stage limit; see script attached to this ticket for an example that exhibits this failure mode).

Reproduce with the following:

var st = new ShardingTest({shards: 2, other: {shardOptions: {verbose: 1}}});
st.stopBalancer();
var coll = st.s0.getDB("test").foo;
 
assert.commandWorked(coll.getDB().adminCommand({enableSharding: coll.getDB().getName()}));
assert.commandWorked(coll.getDB().adminCommand({movePrimary: coll.getDB().getName(),
                                                to: "shard0000"}));
assert.commandWorked(coll.getDB().adminCommand({shardCollection: coll.getFullName(),
                                                key: {_id: 1}}));
assert.commandWorked(coll.getDB().adminCommand({split: coll.getFullName(), middle: {_id: 0}}));
assert.commandWorked(coll.getDB().adminCommand({moveChunk: coll.getFullName(),
                                                find: {_id: 0},
                                                to: "shard0001"}));
 
// Write 10 documents to shard 0, and 10 documents to shard 1.
for (var i=1; i<=10; ++i) {
    assert.writeOK(coll.insert({_id: i, x: i, y: i}));
    assert.writeOK(coll.insert({_id: -i, x: -i, y: i}));
}
 
coll.find().sort({x:1}).limit(9).itcount(); // Shard 0 contributes all 9 results.
coll.find().sort({y:1}).limit(9).itcount(); // Both shards contribute to the 9 results.

The query on line 21 requests 9 documents from both shards, and then an unnecessary additional batch from shard 0:

 m30000| 2014-06-18T17:36:14.380-0400 [conn6] query test.foo query: { query: {}, orderby: { x: 1.0 } } planSummary: COLLSCAN, COLLSCAN cursorid:70249920950 ntoreturn:9 ntoskip:0 keyUpdates:0 numYields:0 locks(micros) r:201 nreturned:9 reslen:380 0ms
 m30001| 2014-06-18T17:36:14.380-0400 [conn4] query test.foo query: { query: {}, orderby: { x: 1.0 } } planSummary: COLLSCAN, COLLSCAN cursorid:23467375332 ntoreturn:9 ntoskip:0 keyUpdates:0 numYields:0 locks(micros) r:214 nreturned:9 reslen:380 0ms
 m30000| 2014-06-18T17:36:14.380-0400 [conn4] getmore test.foo cursorid:70249920950 ntoreturn:9 keyUpdates:0 numYields:0 locks(micros) r:133 nreturned:1 reslen:60 0ms

Whereas the query on line 22 requests 9 documents from both shards and does not follow up with any getmore requests:

 m30000| 2014-06-18T17:36:14.381-0400 [conn6] query test.foo query: { query: {}, orderby: { y: 1.0 } } planSummary: COLLSCAN, COLLSCAN cursorid:70393723465 ntoreturn:9 ntoskip:0 keyUpdates:0 numYields:0 locks(micros) r:162 nreturned:9 reslen:380 0ms
 m30001| 2014-06-18T17:36:14.382-0400 [conn4] query test.foo query: { query: {}, orderby: { y: 1.0 } } planSummary: COLLSCAN, COLLSCAN cursorid:22786735268 ntoreturn:9 ntoskip:0 keyUpdates:0 numYields:0 locks(micros) r:149 nreturned:9 reslen:380 0ms



 Comments   
Comment by Githook User [ 19/Nov/14 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-14299 prevent mongos from requesting unnecessary batches for sort + limit queries

(cherry picked from commit 8b8a118fc965231293ad56c53a837ffacb17132f)

Conflicts:
src/mongo/db/query/new_find.cpp
Branch: v2.6
https://github.com/mongodb/mongo/commit/afc1007c3f96f4bdcf890a779469bab44e5c333f

Comment by Githook User [ 07/Jul/14 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-14299 prevent mongos from requesting unnecessary batches for sort + limit queries
Branch: master
https://github.com/mongodb/mongo/commit/8b8a118fc965231293ad56c53a837ffacb17132f

Generated at Thu Feb 08 03:34:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.