-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
-
QI 2024-03-18, QI 2024-04-01, QI 2024-04-15, QI 2024-04-29, QI 2024-05-13
Sometimes users will write queries of the form
[{$sort: {x:1}}, {$group: {_id: ..., first: {$first: "$y"}, last: {$last: "$y"}}}]
Actually, performing a complete sort is wasteful, since we only need the first and last elements per each group, not the total order.
We should rewrite these queries to the form, with no blocking sort:
[{$group: {_id: ..., first: {$top: {sortBy: "$x", output: "$y"}, last: {$bottom: {sortBy: "$x", output: "$y"}}}}]
Update:
In the rewritten query,
1) the sortBy clause of $top/$topN and $bottom/$bottomN should be sort keys.
2) the $first/$firstN should be replaced by $top/$topN group accumulator.
3) the $last/$lastN should be replaced by $bottom/$bottomN group accumulator.
4) the group-by keys do not need to be changed at all.
5) $sort should be removed.
Other requirements:
1) This optimization should still work if other accumulators are present
Update: because other accumulators' results do not depend at all on the sorted input which may span multiple groups.
2) The optimization should work even if there's a stage between the sort and group, as long as it's correct to do so. (For example, a $addFields)
Update: if there's any project-like stage in between, we need to track renames and dependencies for sort keys and group keys and referenced fields in group accumulators
3) The optimization should work for compound group keys
4) It should work for compound sort keys
5) This optimization is often applicable to time series, but it is not TS-specific.
Update:
This example does not make much sense but it would be weird to not do this if a customer ever writes this pattern of query.
6) $sort in $sort and $group {$top & $bottom} pattern is absorbed into $group if sort keys of $sort is same as the $top & $bottom's sortBy keys. For example,
[ {$sort: {a: 1}}, { $group: { _id: ..., t: {$top: {sortBy: {a: 1}, output: ...}}, b: {$bottom: {sortBy: {a: 1}, output:...}} } } ]
- is related to
-
SERVER-84347 Use DISTINCT_SCAN with $top/$bottom accumulators
- Closed
-
SERVER-28980 aggregation can subsume $sort into $group when $first/$last are present
- Backlog
-
SERVER-9507 Optimize $sort+$group+$first pipeline to avoid full index scan
- Closed
- split to
-
SERVER-87591 Rewrite $sort + stages + $group with $firstN/$lastN to stages + $group with $topN/$bottomN
- Open
-
SERVER-87590 Rewrite $sort + stages + $group with $first/$last to stages + $group with $top/$bottom
- Backlog
-
SERVER-87589 Rewrite adjacent $sort + $group with $first/$last to $group with $top/$bottom for timeseries
- Closed
-
SERVER-88087 Rewrite many $topNs/$bottomNs that have the same sort pattern so that it only creates one sort key
- Closed