[SERVER-7676] find usings sort,skip,limit where sort is on an unindexed attribute produces an error when executed from mongos command prompt but works correctly from mongod command prompt. Created: 15/Nov/12  Updated: 16/Nov/21  Resolved: 07/Nov/14

Status: Closed
Project: Core Server
Component/s: Querying, Sharding
Affects Version/s: 2.2.1
Fix Version/s: 2.8.0-rc0

Type: Bug Priority: Major - P3
Reporter: Dale C Quantz Assignee: David Storch
Resolution: Done Votes: 2
Labels: query_triage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

cluster with a single node 3 server replica set


Attachments: Zip Archive datagen.zip     File exec_knob.js    
Issue Links:
Depends
depends on SERVER-7694 external sort for find command Closed
Related
is related to SERVER-14299 For sharded limit=N queries with sort... Closed
Operating System: ALL
Steps To Reproduce:

 
db.Device.find ({AccountId : 223830}).sort( {DeviceName : 1}).skip(50000).limit(20)
 
QA1MONGOS1:
mongos> db.Device.find ({AccountId : 223830}).sort( {DeviceName : 1}).skip(50000).limit(20)
error: { "$err" : "error querying server", "code" : 15988 }
 
 
QA1MONGODB2:
PRIMARY> db.Device.find ({AccountId : 223830}).sort( {DeviceName : 1}).skip(50000).limit(20)
Successfully returns the data
 
Log snippet
 
Wed Nov 14 15:44:49 [conn384] recv(): message len 318953456 is too large318953456
Wed Nov 14 15:44:49 [conn384] DBClientCursor::init lazy say() failed
Wed Nov 14 15:44:49 [conn384] DBClientCursor::init message from say() was empty
Wed Nov 14 15:44:49 [conn384] got not master for: 172.16.170.142:20007
Wed Nov 14 15:44:49 [conn384] warning: db exception when finishing on rs0:rs0/172.16.170.141:20007,172.16.170.142:20007,172.16.170.143:20007, current connection state is { state: { conn: "rs0/172.16.170.141:20007,172.16.170.142:20007,172.16.170.143:20007", vinfo: "Reporting.Device @ 1|50||5085a74f60f379d79eeaac5b", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: true, errored: false } :: caused by :: 15988 error querying server

Sprint: Query 2.7.8
Participants:

 Description   

create a collection of documents that have at least one attribute that is not indexed. The resulting size of all the documents must be as close to 32MB limit as possible



 Comments   
Comment by Chad Kreimendahl [ 15/Apr/16 ]

Given the new YAML format, WHERE in the configuration file would "internalQueryExecMaxBlockingSortBytes" go? It's not a good long term solution for us... possibly making support for MORE THAN 2 indexes for sorting and filter (or even 2 each for the time being) would do?

Why not support say 8 or 64 or 128 indexes like SQL would?

Comment by Janice Parkinson [ 17/Nov/14 ]

Hi Dave,

Thanks for the quick response. Parameter change works lovely for me for the issue as described here. I will just have to get this into our dev/test environments now for a wider exercise. And then watch for the official 2.8 release!

Thanks so much,
Janice

Comment by David Storch [ 17/Nov/14 ]

Hi jparkinson@absolute.com,

The query system's memory usage limit for sorts can be configured via the internalQueryExecMaxBlockingSortBytes parameter. The sort memory limit can be set to x bytes from the mongo shell as follows:

> db.adminCommand({setParameter: 1, internalQueryExecMaxBlockingSortBytes: x})

Here, x should be replaced with an integer. The default value of the parameter is 33554432 (32 MB).

Alternately, you can specify the parameter using a command line flag or in a config file. The syntax for the command line flag is:

./mongod --setParameter internalQueryExecMaxBlockingSortBytes=x

Best,
Dave

Comment by Janice Parkinson [ 17/Nov/14 ]

Testing on Windows.
db.version() -> 2.8.0-rc0
execute: db.Device.find ({}).sort(

{DeviceName : 1}

).skip(50000).limit(20)
result:
error: {
"$err" : "Executor error: Overflow sort stage buffered data usage of 33554654 bytes exceeds internal limit of 33554432 bytes",
"code" : 17144
}
What is the configuration option to take advantage of this fix?

Comment by Githook User [ 07/Nov/14 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-7676 make sort memory limit configurable
Branch: master
https://github.com/mongodb/mongo/commit/39ded947b914652032fb5a9b3ee991c00e173664

Comment by Valentin Tzvetkov [ 17/Sep/13 ]

To recap the info from Dale's postings above:

We made the following 3 changes to the mongo source:
.../src/mongo/bson/util/builder.h
line 57
//const int BufferMaxSize = 64 * 1024 * 1024;
const int BufferMaxSize = 1024 * 1024 * 1024;

.../src/mongo/db/scanandorder.cpp
line 28
//const unsigned ScanAndOrder::MaxScanAndOrderBytes = 32 * 1024 * 1024;
const unsigned ScanAndOrder::MaxScanAndOrderBytes = 1024 * 1024 * 1024;

.../src/mongo/util/net/message_port.cpp
line 164
//if ( len < 16 || len > 48000000 ) { // messages must be large enough for headers
if ( len < 16 || len > 1024*1024*1024 ) { // messages must be large enough for headers

Those three changes combined made the errors reported above disappear and we've been using that custom build problem free for 10 months now.
The problem is with the custom build we are stuck at release 2.2.1.
I realise the changes are trivial. Still it is not practical for us to repeat this with each MongoDb release as it incurs too much QA overhead.
What we would like to see is those three buffer sizes to be made configurable so we can tailor it to our needs without the need of a custom build.

Comment by Dale C Quantz [ 14/Dec/12 ]

We allow sorts on all the attributes so adding a key for each is just not and option. The write performance would be unacceptable.

There are a couple of suggestions I would like to get some feedback on.

1. Allow buffers sizes to be in configuration (scanandsort, recv) to keep people from having to modify the source(use defaults if buffer.config file not found otherwise override buffer sized indicated in the file)
2. When you run a query on a large data set across multiple nodes the result you want to sort is bound to be large so do the following
db.mycollection.find(

{ AccountId: "23432234"}

).sort(

{ DeviceName : 1 }

).skip(300000).limit(200);

a. run filtered query on each shard
b. return ONLY the filter and sort fields to the mongos
c. merge and sort just the filter and sort fields on the mongos
d. select the documents from within the sort list that match the skip and limit values and and pull the rest of the fields from the various shards that contain these documents

The result of this is that the recv buffers size would only need to be large enough to hold the key and sort attributes. Another change to consider is to prefix the stream of data with a heads up token when the data anticipated is > then the normally allocated buffer. This way you can allocate a larger buffer only when necessary (still have some maximum relative to the size of the total system memory).

Comment by Dale C Quantz [ 17/Nov/12 ]

we have 70 attributes we can sort on. Adding and index would kill update performance to the point of being unusable. I took out the recv limit that was hard coded to 48000000 and made it match my scanandsort size. It seems to be working as a short term solution. We will have to do performance testing on putting all the attributes into an array and creating a multi-key index on it

Comment by Daniel Pasette (Inactive) [ 16/Nov/12 ]

Hi Dale, I don't think this is related to the change from standalone to replica set backing the shard.

On a single mongod, when you do a sort on an unindexed field, the entire result set needs to be held in memory for the sort. This can be mitigated if you use a limit with an unindexed sort, because mongod only needs to keep a sorted list equal to the number of results specified in the limit. However, when you use a skip and a limit, mongod needs to keep track of up to the skip plus the limit.

When you run the exact same query on mongos, the skip portion is removed before it is forwarded to the shard. This is because you cannot perform the skip until all the results are merged in the mongos. This can push the query over the maximum amount of data allowed for an unindexed sort.

This is why you're seeing the same query with a large skip work on a single mongod fail on the mongos. One workaround is to raise an index on the sort key.

Comment by Dale C Quantz [ 16/Nov/12 ]

Yes, it seems that the mongos tries to allocate on the recv more memory that it is allowed to. All our servers have 128GB of RAM on them and it is only trying to allocate 318MB and we are running just a single test.

We made 2 mods to the mongo source
scanandorder.cpp

const unsigned ScanAndOrder::MaxScanAndOrderBytes = 1024 * 1024 * 1024;
builder.h
const int BufferMaxSize = 1024 * 1024 * 1024;

We did not change the bson max size because we are always limiting the results sets we return to the driver.

Comment by Adam Comerford [ 16/Nov/12 ]

Dale - just to double check, does this accurately describe what you are seeing/doing:

1. Engineer a large sort, close to the 32MB limit
2. Run the sort on a mongos, with a single mongod behind it --> Success
3. Run the same sort on a mongos, with a replica set behind it --> Fail

Adam.

Comment by Dale C Quantz [ 15/Nov/12 ]

from the mongos run the following:

use admin
db.runCommand(

{ enablesharding : "stress" }

);
db.runCommand( { shardcollection : "stress.Post", key :

{ bigId : 1 }

} )
use stress
db.Post.ensureIndex(

{"bigId": 1}

)
db.Post.find(

{"bigId": 1}

)

run the test app attached
you need to run this data generator on a windows system with .net 4 installed(sorry)

run these commands from the mongos
use stress
db.Post.find({}).sort(

{name : -1}

).skip(50000).limit(100);

you should see this:
error:

{ "$err" : "error querying server", "code" : 15988 }

server config is 3 node replica set and a single node mongos.

Comment by Dale C Quantz [ 15/Nov/12 ]

Anecdotally, the query worked fine when the cluster was a single mongod behind a mongos. The issue surfaced when we setup a replica set behind the mongos.

Generated at Thu Feb 08 03:15:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.