-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Aggregation Framework, Diagnostics
-
Fully Compatible
-
Query 2018-12-17, Query 2018-12-31
When a sharded aggregation throws, we don't report from where in the cluster the error was generated. To test this, I wrote a simple $assert stage that always throws.
mongos> db.runCommand({aggregate: "coll", cursor: {}, pipeline: [{$assert: 1}, {$match: {x: 1}}, {$group: {_id: "$x"}}]}) { "ok" : 0, "errmsg" : "throwing from $assert", "code" : 50893, "codeName" : "Location50893", "operationTime" : Timestamp(1533156181, 222), "$clusterTime" : { "clusterTime" : Timestamp(1533156243, 3), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } }
The error message format is the same if I force an assertion in the merger part:
mongos> db.runCommand({aggregate: "coll", cursor: {}, pipeline: [{$match: {x: 1}}, {$group: {_id: "$x"}}, {$assert: 1}]}) { "ok" : 0, "errmsg" : "throwing from $assert", "code" : 50893, "codeName" : "Location50893", "operationTime" : Timestamp(1533156181, 222), "$clusterTime" : { "clusterTime" : Timestamp(1533156243, 3), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } }
We suspect the AsyncResultsMerger converts the AsyncRequestsSender::Response objects from each shard into a status and immediately throws if it's non-OK. However, this is losing important information; we could indicate from which shard the error occurred. It also hides any other errors that might have been collected.
This has implications for the improved $out project, as a failing sharded $out would not indicate from where the failures occurred, making diagnosis harder.