[SERVER-29444] Use a covered and streaming plan for $group, $sum:1 queries Created: 04/Jun/17 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Nicolas Dascanio | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 11 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||||||
| Sprint: | Query 2019-07-29 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||
| Description |
|
A very common use of an aggregation is to count elements by group:
If there is an index over category, this pipeline could be covered and no document retrieval is necessary. Since this is a quite common pattern, you may take it into account and make use of indexes. |
| Comments |
| Comment by David Storch [ 06/Aug/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
After reviewing this again, I agree that this is not quite a duplicate of an existing ticket. It's closely related to SERVER-4507 (streaming $group) and | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nicolas Dascanio [ 24/Jun/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I understand that when you have both the index and the collection in memory, a full scan on the index or on the collection should be similar. However, indexes should always be "hot" or fit entirely in memory, while it's not always possible to have an entire collection in memory. You may have several collections and all it's indexes in memory. While one particular collection may fit in memory, it's unlikely that all of them could be in memory. Obviously, this has a lot to do with the size of the collections. Thanks for the interest and responses | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 24/Jun/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
stoma to add to what's been said already, in your example, if there is an index over category, this pipeline could be covered and no document retrieval is necessary, however, our testing has consistently shown that unless you have huge documents or the collection is cold (not in memory) and the index is hot (in memory), the covered index scan for the pipeline you provide is not faster than collection scan. You can see some of this discussed in SERVER-23406 which tracks ways we may be able to improve index scans. In addition, we are tracking work in SERVER-4507 for $group to be able to take advantage of sorted sequences (if group key is sorted as when it's coming from an index). Again, preliminary testing seemed to indicate that for the most common case (like yours) the speedup is only on the order of 10% or so, which is why that work wasn't prioritized higher. I believe your request is really for $group to be faster in simple pipelines, like the one you list, rather than just merely to use an index. Best, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kyle Suarez [ 20/Jun/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kyle Suarez [ 05/Jun/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
david.storch – my bad, I indeed ran that test on my development branch for stoma, we are currently working on Regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nicolas Dascanio [ 05/Jun/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Kyle Suares, yes, it's a typo. Feel free to edit the description:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 05/Jun/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
kyle.suarez, I'm not observing the same thing on master:
But won't this be fixed once your changes for | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kyle Suarez [ 05/Jun/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Note that the user is missing the "$" in front of category, but I assume this is just a small typo in the description of this ticket. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 05/Jun/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Pretty sure we do this already in 3.4: |