[SERVER-12708] Aggregation failed spill in sharded cluster does not return error Created: 12/Feb/14  Updated: 11/Jul/16  Resolved: 19/Feb/14

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Sharding
Affects Version/s: None
Fix Version/s: 2.6.0-rc0

Type: Bug Priority: Major - P3
Reporter: Andrew Emil (Inactive) Assignee: Mathias Stearn
Resolution: Done Votes: 0
Labels: 26qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File 369_helpers.js     File ms2.out     File multi_spill_2.js    
Issue Links:
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

It seems like when running an aggregation command against a sharded cluster that causes a spill to disk and having allowDiskUse: false it is possible for a non-error document to be returned.

The test to reproduce this is "multi_spill_2.js", I am running it with the following command:

python buildscripts/smoke.py --mongo=/Users/ace/neweng/mongo/mongo --mongod=/Users/ace/neweng/mongo/mongod --continue-on-failure --test-path=/Users/ace/repro --mode=files /Users/ace/repro/multi_spill_2.js > ms2.out

Snippits from the output:

 m30000| 2014-02-12T14:17:34.261-0800 [conn10] getmore aggShard.qa369_multi_2 cursorid:23609826722 ntoreturn:0 keyUpdates:0 exception:
Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to op
t in. code:16819 numYields:13 locks(micros) r:15175 nreturned:1 reslen:186 15ms
 m30001| 2014-02-12T14:17:34.261-0800 [conn9] getmore aggShard.qa369_multi_2 cursorid:22109034986 ntoreturn:0 keyUpdates:0 exception: S
ort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt
 in. code:16819 numYields:13 locks(micros) r:15325 nreturned:1 reslen:186 15ms

But then the result:

assert: command worked when it should have failed: { "result" : [ { "_id" : "" } ], "ok" : 1 } : undefined

Buildinfo for version I tested against:

> db.runCommand({buildinfo:1})
{
        "version" : "2.5.6-pre-",
        "gitVersion" : "9f847a1a91a7f1861733ce6fe1617da89dd945c0",
        "OpenSSLVersion" : "",
        "sysInfo" : "Darwin Andrew-Emil-MacBook-Pro.local 12.4.0 Darwin Kernel Version 12.4.0: Wed May  1 17:57:12 PDT 2013; root:xnu-2050.24.15~1/RELEASE_X86_64 x86_64 BOOST_LIB_VERSION=1_49",
        "loaderFlags" : "-fPIC -pthread -Wl,-bind_at_load -mmacosx-version-min=10.6",
        "compilerFlags" : "-Wnon-virtual-dtor -Woverloaded-virtual -fPIC -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -Werror -pipe -O3 -Wno-unused-function -Wno-deprecated-declarations -mmacosx-version-min=10.6",
        "allocator" : "tcmalloc",
        "versionArray" : [
                2,
                5,
                6,
                -100
        ],
        "javascriptEngine" : "V8",
        "bits" : 64,
        "debug" : false,
        "maxBsonObjectSize" : 16777216,
        "ok" : 1
}



 Comments   
Comment by Githook User [ 19/Feb/14 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-12708 Report errors from shards when mergingPresorted results.

To prevent similar issues from happening in the future, all code to pull
results off of the shards cursors in agg now go through a single function.
This function was enhanced to propagate runtime error codes as we already do
with parse-time errors reported from the shards.
Branch: master
https://github.com/mongodb/mongo/commit/0ffa7266291ab19cfe98109c77c9c3ac38c5e1f0

Generated at Thu Feb 08 03:29:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.