[SERVER-19266] An error document is returned with result set Created: 02/Jul/15  Updated: 17/Mar/16  Resolved: 17/Dec/15

Status: Closed
Project: Core Server
Component/s: Querying, Sharding
Affects Version/s: 2.6.10, 3.0.4
Fix Version/s: 2.6.12, 3.0.9

Type: Bug Priority: Major - P3
Reporter: Steven Hand Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Steps To Reproduce:

Import Enron message data.

On a sharded cluster

mongos> db.messages.count()
120477
mongos> db.messages.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "enron.messages"
	},
	{
		"v" : 1,
		"key" : {
			"headers.From" : 1,
			"headers.Date" : 1
		},
		"name" : "headers.From_1_headers.Date_1",
		"ns" : "enron.messages"
	}
]
mongos> db.messages.find({},{_id:0,'headers.From':1,'headers.Subject':1,'headers.Date':1}).sort({'headers.Date':1}).batchSize(10);
{ "headers" : { "Date" : ISODate("2001-02-01T06:10:00Z"), "From" : "tracy.geaccone@enron.com", "Subject" : "Re: Transition Issues" } }
{ "headers" : { "Date" : ISODate("2001-02-01T06:17:00Z"), "From" : "stinson.gibner@enron.com", "Subject" : "Vacation day Feb. 16" } }
{ "headers" : { "Date" : ISODate("2001-02-01T06:17:00Z"), "From" : "stinson.gibner@enron.com", "Subject" : "Vacation day Feb. 16" } }
{ "headers" : { "Date" : ISODate("2001-02-01T06:17:00Z"), "From" : "stinson.gibner@enron.com", "Subject" : "Vacation day Feb. 16" } }
{ "headers" : { "Date" : ISODate("2001-02-01T06:53:00Z"), "From" : "stinson.gibner@enron.com", "Subject" : "Re: P+ spread options" } }
{ "headers" : { "Date" : ISODate("2001-02-01T06:53:00Z"), "From" : "stinson.gibner@enron.com", "Subject" : "Re: P+ spread options" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:00:00Z"), "From" : "matt.smith@enron.com", "Subject" : "Re: fun" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:00:00Z"), "From" : "matt.smith@enron.com", "Subject" : "Re: fun" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:00:00Z"), "From" : "matt.smith@enron.com", "Subject" : "Re: fun" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:07:00Z"), "From" : "phil.demoes@enron.com", "Subject" : "Re: Transco Z4/Z5 curves for Piedmont" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:07:00Z"), "From" : "phil.demoes@enron.com", "Subject" : "Re: Transco Z4/Z5 curves for Piedmont" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:20:00Z"), "From" : "mary.poorman@enron.com", "Subject" : "Re: Meter 986315 for 10/00" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:20:00Z"), "From" : "mary.poorman@enron.com", "Subject" : "Re: Meter 986315 for 10/00" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:20:00Z"), "From" : "mary.poorman@enron.com", "Subject" : "Re: Meter 986315 for 10/00" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:29:00Z"), "From" : "robin.rodrigue@enron.com", "Subject" : "VAR" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:29:00Z"), "From" : "robin.rodrigue@enron.com", "Subject" : "VAR" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:29:00Z"), "From" : "robin.rodrigue@enron.com", "Subject" : "VAR" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:29:00Z"), "From" : "robin.rodrigue@enron.com", "Subject" : "VAR" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:30:00Z"), "From" : "michael.tribolet@enron.com", "Subject" : "Re: LA Times article" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:30:00Z"), "From" : "michael.tribolet@enron.com", "Subject" : "Re: LA Times article" } }
Type "it" for more
mongos> it
{ "headers" : { "Date" : ISODate("2001-02-01T07:32:00Z"), "From" : "chris.germany@enron.com", "Subject" : "Re: Insurance dough" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:32:00Z"), "From" : "chris.germany@enron.com", "Subject" : "Re: Insurance dough" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:35:00Z"), "From" : "tori.kuykendall@enron.com", "Subject" : "" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:35:00Z"), "From" : "tori.kuykendall@enron.com", "Subject" : "" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:35:00Z"), "From" : "tori.kuykendall@enron.com", "Subject" : "" } }
{ "headers" : { "Date" : ISODate("2001-02-01T07:35:00Z"), "From" : "tori.kuykendall@enron.com", "Subject" : "" } }
Error: error: {
	"$err" : "getMore executor error: Overflow sort stage buffered data usage of 33557210 bytes exceeds internal limit of 33554432 bytes",
	"code" : 17406
}

Sprint: Sharding 9 (09/18/15), Sharding D (12/11/15), Sharding E (01/08/16)
Participants:

 Description   

A query to a sufficiently large sharded collection that requires a sort on an unindexed field and a sufficiently small batch size results in a result set with batch size number of documents plus an error document from each shard returning results.



 Comments   
Comment by Githook User [ 17/Dec/15 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-19266 An error document is returned with result set

Make sure that the error result flag is set when receiving one from of the shards during GET_MORE.

(cherry picked from commit eb79bd1fe807512e54c68c5982b2a24fa1d66bba)

Conflicts:
src/mongo/client/parallel.cpp
Branch: v2.6
https://github.com/mongodb/mongo/commit/cac4b59579191300d46eefcb786ce0382aa68817

Comment by Githook User [ 16/Dec/15 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-19266 An error document is returned with result set

Make sure that the error result flag is set when receiving one from of the shards during GET_MORE.
Branch: v3.0
https://github.com/mongodb/mongo/commit/eb79bd1fe807512e54c68c5982b2a24fa1d66bba

Comment by Andy Schwerin [ 07/Oct/15 ]

I have confirmed that this bug no longer occurs on master as a result of SERVER-15176. The error message for failed find commands leaves something to be desired, but that is a separate issue.

mongos> db.messages.find({},{_id:0,'headers.From':1,'headers.Subject':1,'headers.Date':1}).sort({'headers.Date':1}).batchSize(10);
assert: command failed: {
	"ok" : 0,
	"errmsg" : "Executor error during find command: OperationFailed Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.",
	"code" : 96
} : undefined
_getErrorWithCode@src/mongo/shell/utils.js:23:13
doassert@src/mongo/shell/assert.js:13:14
assert.commandWorked@src/mongo/shell/assert.js:259:5
DBCommandCursor@src/mongo/shell/query.js:657:5
DBQuery.prototype._exec@src/mongo/shell/query.js:103:28
DBQuery.prototype.hasNext@src/mongo/shell/query.js:257:5
DBQuery.prototype.shellPrint@src/mongo/shell/query.js:500:17
shellPrintHelper@src/mongo/shell/utils.js:444:1
@(shell2):1:1
 
Error: command failed: {
	"ok" : 0,
	"errmsg" : "Executor error during find command: OperationFailed Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.",
	"code" : 96
} : undefined

Comment by Randolph Tan [ 17/Sep/15 ]

TODO:

  • Confirm that SERVER-15176 makes this problem go away.
  • Fix v3.0 so mongos will correctly propagate the error flag bit from mongod.
Comment by David Storch [ 05/Aug/15 ]

We expect this problem to be solved by the introduction of the find and getMore commands (see linked ticket SERVER-15176).

Comment by Randolph Tan [ 08/Jul/15 ]

schwerin The getMore command is not yet implemented in mongos right now, so I am going to describe what needs to be changed in the code base as of v3.1.5:

1. Need to have ShardedClientCursor keep track of the result flags responses from the different shards and at the minimum aggregate the error flag. Then, expose this information so mongos can properly propagate the result flags to users.
2. To write the jstest, we also need to expose to the hasResultFlag method to the shell so we can use it to check if the flag we are looking for is set or not.

Comment by Andy Schwerin [ 08/Jul/15 ]

renctan, how involved is the fix? If we upconvert OP_GETMORE to the getMore command in on master, we'll be using a different code path. It would be good to get a minimal regression js test written.

Comment by Randolph Tan [ 07/Jul/15 ]

Note: the error happens only during getMore due to the nToReturn hack that generates a plan with k top sort ORed with the normal plan. The initial batch uses the k top sort and switches to the normal plan when it realizes that actual query requires more than k results.

Comment by Randolph Tan [ 07/Jul/15 ]

It looks like mongod has the error set, but mongos ignores it:

https://github.com/mongodb/mongo/blob/r3.1.5/src/mongo/s/strategy.cpp#L625

On the other hand, the error flag is being set on single sharded getMore response since it simply passes what it got from the shard:

https://github.com/mongodb/mongo/blob/r3.1.5/src/mongo/s/strategy.cpp#L593

Generated at Thu Feb 08 03:50:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.