-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
We should consider swapping $match with $setWindowFields. I think it's valid when:
- within each partition, the predicate is either always true or always false
- the $match doesn't depend on any 'output' field
For example, in this query:
{$setWindowFields: { partitionBy: ["$state", "$city"], output: {total: {$sum: "$x"}}, }}, {$match: {state: "NY"}},
Doing the $match first shouldn't change the result, because it drops whole partitions.
However, this could be tricky given how we desugar $setWindowFields:
{$set: {__tmp: ["$state", "$city"]}}, {$sort: {__tmp: 1}}, {$_internalSetWindowFields: { partitionBy: "$__tmp", output: {total: {$sum: "$x"}}, }}, {$unset: 'tmp'}, {$match: {state: "NY"}},
It will be hard for the optimizer to see the relationship between {state: "NY"} and partitionBy: "$__tmp". Some things that could help are:
- a new analysis (functional dependency)
- ability to $sort by expression, instead of a __tmp field
- a way to defer desugaring $setWindowFields until after some optimization
- related to
-
SERVER-56583 Push $setWindowFields to shards when shards contain whole partitions
- Backlog