Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Major - P3
Fix Version/s: 4.1.4
Affects Version/s: 2.4.3
Component/s: Aggregation Framework
Labels:
- 4.1.3
- asya
- mock-pm
- optimization
- performance

Backwards Compatibility:
Fully Compatible
Sprint:
Query 2018-06-04, Query 2018-08-13, Query 2018-08-27, Query 2018-09-10, Query 2018-09-24, Query 2018-10-08
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

This is an analogue to ~~SERVER-2094~~ ("distinct cheat with indexes"), but for the aggregation framework.

This performance improvement is to allow $group operators like $first to be able to take advantage of the fact that the input to the pipeline is sorted, and thus reduce the number of index entries scanned by "skipping" processing of large portions of the pipeline.

For example, suppose a user has a collection with an index {x:1,y:1}, and that x has low cardinality. Consider the following pipeline:

db.foo.aggregate({$sort:{x:1,y:1}},{$group:{_id:{x:"$x"},y:{$first:"$y"}}})

Currently, the above pipeline will perform a full scan of the index. After this optimization, the above pipeline will only have to scan on the order of |x| index entries, which is much smaller than the size of the index.

This ticket is filed as a result of discussion in ~~SERVER-9272~~ (full use case available there).

is depended on by

SERVER-36517 Allow allPaths indexes to provide DISTINCT_SCAN

Closed

is duplicated by

SERVER-9272 Querying latest document based on a set of field

Closed

SERVER-31269 Too many documents examined when using an index and $first/$last in $group stage

Closed

is related to

SERVER-69359 Aggregate query bails on DISTINCT_SCAN and uses IXSCAN

Closed

SERVER-97238 $group with $first/$last to distinct scan optimization might incorrectly unwind arrays

Closed

SERVER-2130 Ability to use Limit() with Distinct()

Backlog

SERVER-27915 Make $group with $addToSet accumulator use DISTINCT_SCAN when applicable

Backlog

SERVER-2094 distinct cheat with indexes

Closed

SERVER-15291 slow '$group' performance

Closed

SERVER-29244 CLONE - distinct cheat with indexes

Closed

related to

SERVER-37459 $group with $$ROOT returns error

Closed

SERVER-85213 Rewrite $sort+$group with $first/$last to use $top/$bottom

Backlog

SERVER-4507 aggregation: optimize $group to take advantage of sorted sequences

Backlog

SERVER-23732 Aggregation should optimize an irrelevant $sort preceding a $group

Backlog

SERVER-28980 aggregation can subsume $sort into $group when $first/$last are present

Backlog

SERVER-37715 Use DISTINCT_SCAN for $unwind-$group pipelines

Backlog

SERVER-37304 Extend $sort+$group+$first pipeline optimization to $last

Closed

SERVER-40090 DISTINCT_SCAN in agg is only used when certain format of _id is specified

Closed

SERVER-55576 Optimize queries on time-series collections which request the most recent value

Closed

(5 is related to, 9 related to)

Assignee:: Justin Seyster
Reporter:: Backlog - Query Team (Inactive)
Participants:: Backlog - Query Team, Githook User, Ian Whalen, J Rassi, Justin Seyster, Mervin San Andres
Votes:: 9 Vote for this issue
Watchers:: 23 Start watching this issue

Created:: Apr 29 2013 10:10:37 PM UTC
Updated:: Feb 18 2025 06:48:23 PM UTC
Resolved:: Sep 26 2018 08:00:26 PM UTC
Confidence Status Last Update:: 02/Aug/18 4:18 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates