Details
-
Task
-
Resolution: Fixed
-
Major - P3
-
None
-
Fully Compatible
-
Query 2018-12-17, Query 2018-12-31
Description
When a sharded aggregation throws, we don't report from where in the cluster the error was generated. To test this, I wrote a simple $assert stage that always throws.
mongos> db.runCommand({aggregate: "coll", cursor: {}, pipeline: [{$assert: 1}, {$match: {x: 1}}, {$group: {_id: "$x"}}]}) |
{
|
"ok" : 0, |
"errmsg" : "throwing from $assert", |
"code" : 50893, |
"codeName" : "Location50893", |
"operationTime" : Timestamp(1533156181, 222), |
"$clusterTime" : { |
"clusterTime" : Timestamp(1533156243, 3), |
"signature" : { |
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), |
"keyId" : NumberLong(0) |
}
|
}
|
}
|
The error message format is the same if I force an assertion in the merger part:
mongos> db.runCommand({aggregate: "coll", cursor: {}, pipeline: [{$match: {x: 1}}, {$group: {_id: "$x"}}, {$assert: 1}]}) |
{
|
"ok" : 0, |
"errmsg" : "throwing from $assert", |
"code" : 50893, |
"codeName" : "Location50893", |
"operationTime" : Timestamp(1533156181, 222), |
"$clusterTime" : { |
"clusterTime" : Timestamp(1533156243, 3), |
"signature" : { |
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), |
"keyId" : NumberLong(0) |
}
|
}
|
}
|
We suspect the AsyncResultsMerger converts the AsyncRequestsSender::Response objects from each shard into a status and immediately throws if it's non-OK. However, this is losing important information; we could indicate from which shard the error occurred. It also hides any other errors that might have been collected.
This has implications for the improved $out project, as a failing sharded $out would not indicate from where the failures occurred, making diagnosis harder.