Details
-
Improvement
-
Resolution: Unresolved
-
Major - P3
-
None
-
None
-
None
-
None
-
Query Optimization
Description
We should consider swapping $match with $setWindowFields. I think it's valid when:
- within each partition, the predicate is either always true or always false
- the $match doesn't depend on any 'output' field
For example, in this query:
{$setWindowFields: {
|
partitionBy: ["$state", "$city"],
|
output: {total: {$sum: "$x"}},
|
}},
|
{$match: {state: "NY"}},
|
Doing the $match first shouldn't change the result, because it drops whole partitions.
However, this could be tricky given how we desugar $setWindowFields:
{$set: {__tmp: ["$state", "$city"]}},
|
{$sort: {__tmp: 1}},
|
{$_internalSetWindowFields: {
|
partitionBy: "$__tmp",
|
output: {total: {$sum: "$x"}},
|
}},
|
{$unset: 'tmp'},
|
{$match: {state: "NY"}},
|
It will be hard for the optimizer to see the relationship between {state: "NY"} and partitionBy: "$__tmp". Some things that could help are:
- a new analysis (functional dependency)
- ability to $sort by expression, instead of a __tmp field
- a way to defer desugaring $setWindowFields until after some optimization
Attachments
Issue Links
- related to
-
SERVER-56583 Push $setWindowFields to shards when shards contain whole partitions
-
- Backlog
-