[SERVER-9507] Optimize $sort+$group+$first pipeline to avoid full index scan Created: 29/Apr/13 Updated: 07/Sep/22 Resolved: 26/Sep/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 2.4.3 |
| Fix Version/s: | 4.1.4 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Backlog - Query Team (Inactive) | Assignee: | Justin Seyster |
| Resolution: | Done | Votes: | 9 |
| Labels: | 4.1.3, asya, mock-pm, optimization, performance | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Query 2018-06-04, Query 2018-08-13, Query 2018-08-27, Query 2018-09-10, Query 2018-09-24, Query 2018-10-08 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
This is an analogue to This performance improvement is to allow $group operators like $first to be able to take advantage of the fact that the input to the pipeline is sorted, and thus reduce the number of index entries scanned by "skipping" processing of large portions of the pipeline. For example, suppose a user has a collection with an index {x:1,y:1}, and that x has low cardinality. Consider the following pipeline:
Currently, the above pipeline will perform a full scan of the index. After this optimization, the above pipeline will only have to scan on the order of |x| index entries, which is much smaller than the size of the index. This ticket is filed as a result of discussion in |
| Comments |
| Comment by Githook User [ 26/Sep/18 ] |
|
Author: {'name': 'Justin Seyster', 'email': 'justin.seyster@mongodb.com', 'username': 'jseyster'}Message: |
| Comment by Ian Whalen (Inactive) [ 20/Sep/18 ] |
|
Target date: definitely end of this sprint (8 weeks). |
| Comment by Ian Whalen (Inactive) [ 06/Sep/18 ] |
|
Target date of: end of this sprint. (6 weeks) |
| Comment by J Rassi [ 02/May/13 ] |
|
Correct, the aggregation framework currently cannot use an index to help optimize those pipelines (which is unrelated to this ticket). If an index cannot be used to satisfy a $sort, then an in-memory sort is performed, in which case all documents in the pipeline have to be examined anyway (so no significant performance improvement can be made). If an index can be used to satisfy a $sort, and only a small subset of documents are needed by a later pipeline stage (in a way that the sort order can be employed), then the optimization suggested here will drastically reduce the number of index entries scanned. |
| Comment by Mervin San Andres [ 30/Apr/13 ] |
|
What if, prior to this set of operators, I would have to perform other pipeline operators such as $project and $unwind? As far as I know, indexes can no longer be used after the transformation. |