Details
-
Improvement
-
Status: Closed
-
Major - P3
-
Resolution: Duplicate
-
None
-
None
-
None
-
Query Optimization 2021-04-05, Query Optimization 2021-04-19, Query Optimization 2021-05-03, Query Optimization 2021-05-17, Query Optimization 2021-05-31, Query Optimization 2021-06-14
Description
There are a couple ways to write a similar query by flipping the sort and $first/$last, but consider this example:
db.my_timeseries.aggregate([
|
{$sort: {ts: 1}},
|
{$group: {_id: "$meta.x", most_recent_foo: {$last: "$foo"}}} |
])
|
This is the same pattern as described in SERVER-9507, and we should be able to have this query do something like a DISTINCT_SCAN if one of the following indexes exist:
{ts: +/-1, meta: +/-1}
|
{meta: +/-1, ts: +/-1}
|
{_id: +/-1, meta: +/-1}
|
{meta: +/-1, _id: +/-1}
|
The last two would probably be pretty challenging to implement and would have to do some similar analysis to SERVER-55106 in order to translate the "ts" predicate/scan into something on _id. It may not even be possible.
We could also instead consider transforming the query not into a distinct scan but into a reverse _id scan to ensure whatever we find is most recent, and performing a streaming $group implementation (SERVER-4507). This is generally a hard operation to perform, but within the context of a time-series collection it might be easier to prove that the optimization is correct.
Attachments
Issue Links
- is related to
-
SERVER-55106 Map predicates on max time to a portion of _id
-
- Closed
-
-
SERVER-4507 aggregation: optimize $group to take advantage of sorted sequences
-
- Backlog
-
-
SERVER-9507 Optimize $sort+$group+$first pipeline to avoid full index scan
-
- Closed
-
-
SERVER-37304 Extend $sort+$group+$first pipeline optimization to $last
-
- Closed
-