[SERVER-34741] Move $match in front of $group if condition is on group key Created: 28/Apr/18 Updated: 29/Oct/23 Resolved: 20/Jun/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.1, 4.2.12 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Asya Kamsky | Assignee: | Backlog - Query Optimization |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | neweng, optimization, performance, read-only-views | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query Optimization
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Backport Requested: |
v4.2
|
||||||||
| Sprint: | Query 2019-06-03, Query 2019-06-17, Query 2019-07-01 | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
In case of views created on results of a $group, if the subsequent query is filtering on _id field then it should be possible to push that $match in front of $group transforming it into filter against whatever field is being grouped. Example: Vew definition:
Query on view:
Pipeline transformation:
|
| Comments |
| Comment by Githook User [ 16/Dec/20 ] | |||||||||||
|
Author: {'name': 'George Wangensteen', 'email': 'george.wangensteen@10gen.com'}Message: (cherry picked from commit 55e76198b1e41b5dbd5868d2ed8b914da29f0f29) | |||||||||||
| Comment by Githook User [ 20/Jun/19 ] | |||||||||||
|
Author: {'name': 'George Wangensteen', 'email': 'george.wangensteen@10gen.com'}Message: | |||||||||||
| Comment by Asya Kamsky [ 06/Jul/18 ] | |||||||||||
|
I got to thinking that this optimization may not be correct to do in the following scenario:
If the actual usage of the pipeline is inside expressive $lookup:
Here the pipeline was written specifically to allow the first (group) stage to be non-correlated, so it will only execute once for outer aggregation, with just a single $match for each document streaming through. Optimizing the inner expressive lookup pipeline here would defeat the plan by always first doing match on single value and then doing $group - while overall it's a smaller $group, it may be the wrong approach to reduce overall amount of work. So we should be careful about where this optimization is applied when we implement this. |