-
Type:
Task
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Aggregation Framework, Diagnostics
-
Fully Compatible
-
Query 2018-12-17, Query 2018-12-31
-
None
-
None
-
None
-
None
-
None
-
None
-
None
When a sharded aggregation throws, we don't report from where in the cluster the error was generated. To test this, I wrote a simple $assert stage that always throws.
mongos> db.runCommand({aggregate: "coll", cursor: {}, pipeline: [{$assert: 1}, {$match: {x: 1}}, {$group: {_id: "$x"}}]})
{
"ok" : 0,
"errmsg" : "throwing from $assert",
"code" : 50893,
"codeName" : "Location50893",
"operationTime" : Timestamp(1533156181, 222),
"$clusterTime" : {
"clusterTime" : Timestamp(1533156243, 3),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
The error message format is the same if I force an assertion in the merger part:
mongos> db.runCommand({aggregate: "coll", cursor: {}, pipeline: [{$match: {x: 1}}, {$group: {_id: "$x"}}, {$assert: 1}]})
{
"ok" : 0,
"errmsg" : "throwing from $assert",
"code" : 50893,
"codeName" : "Location50893",
"operationTime" : Timestamp(1533156181, 222),
"$clusterTime" : {
"clusterTime" : Timestamp(1533156243, 3),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
We suspect the AsyncResultsMerger converts the AsyncRequestsSender::Response objects from each shard into a status and immediately throws if it's non-OK. However, this is losing important information; we could indicate from which shard the error occurred. It also hides any other errors that might have been collected.
This has implications for the improved $out project, as a failing sharded $out would not indicate from where the failures occurred, making diagnosis harder.