-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Aggregation Framework
-
Labels:None
-
Query Optimization
Suppose your run the following aggregation:
db.test.aggregate([ {$group: { _id: { date: {$dateToString: {date: "$_id", format: "%Y-%m-%d"}}, region: "$region" }, total: {$sum: 1} }}, {$project: {_id: 0, region: "$_id.region", date: "$_id.date", total: 1}}, {$out: {to: "sharded_by_region", mode: "replaceDocuments", uniqueKey: {region: 1, _id: 1}}} ])
Further suppose that the collection "sharded_by_region" has shard key {region: 1}. It looks like this pipeline is eligible for an $exchange optimization because all the way from the $group to the $out the shard key is preserved - it's just renamed from "_id.region" to top-level "region".
Unfortunately, our dependency/rename tracking will not consider this to be a strict rename, because it cannot figure out that "_id" won't be an array. If "_id" were an array, than the $project stage would be doing more than a rename, instead transforming the array previously stored in "_id" and storing the result of the transformation in "region" or "date" accordingly.
This use-case of using a $group with multiple group-by keys seems common enough for us to consider adding custom logic to communicate to the dependency/rename tracking system that we know that either (1) "_id" is not an array or (2) the pipeline will result in an error because the shard key and the uniqueKey cannot contain arrays.