[SERVER-56419] Push down $match past $setWindowFields when it keeps/drops whole partitions Created: 27/Apr/21  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Percy Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-56583 Push $setWindowFields to shards when ... Backlog
Assigned Teams:
Query Optimization
Participants:

 Description   

We should consider swapping $match with $setWindowFields. I think it's valid when:

  • within each partition, the predicate is either always true or always false
  • the $match doesn't depend on any 'output' field

For example, in this query:

{$setWindowFields: {
    partitionBy: ["$state", "$city"],
    output: {total: {$sum: "$x"}},
}},
{$match: {state: "NY"}},

Doing the $match first shouldn't change the result, because it drops whole partitions.

However, this could be tricky given how we desugar $setWindowFields:

{$set: {__tmp: ["$state", "$city"]}},
{$sort: {__tmp: 1}},
{$_internalSetWindowFields: {
    partitionBy: "$__tmp",
    output: {total: {$sum: "$x"}},
}},
{$unset: 'tmp'},
{$match: {state: "NY"}},

It will be hard for the optimizer to see the relationship between {state: "NY"} and partitionBy: "$__tmp". Some things that could help are:

  • a new analysis (functional dependency)
  • ability to $sort by expression, instead of a __tmp field
  • a way to defer desugaring $setWindowFields until after some optimization

Generated at Thu Feb 08 05:39:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.