[SERVER-62405] Partially-streaming compound $sort Created: 06/Jan/22 Updated: 05/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | David Percy | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
In a pipeline like this:
we could optimize the second $sort, to take advantage of the fact that the input is already sorted on {x: 1}. Any two documents with the same {x: 1} key are already next to each other, so whenever we see 'x' increase, we can start returning results before asking for more input. Depending on how big these runs of equal 'x' are, this could save a lot of space, and maybe avoid spilling. Also, we only need to compare values of 'y', since we're only breaking ties within each run of equal 'x' values. Maybe that would be useful if 'x' is large, or the overall document is small. This could be useful for cases like:
|
| Comments |
| Comment by David Percy [ 22/Nov/22 ] |
|
Another way to think about this might be similar to recursive index navigation. If you have an index {a: 1}, and you want to sort by {a: 1, b: 1}, we could have a nested loop, where we get unique 'a' values and then do a separate blocking sort for each one. This would also work for cases like {$unpackBucket ...} {$sort: {meta: 1, temperature: 1}}, assuming you have an index on {meta: 1}. Something like:
However this would not work for stages like $densify or $setWindowFields, which carry state between documents. |
| Comment by Pawel Terlecki [ 28/Apr/22 ] |
|
I agree with anton.korshunov@mongodb.com. A good idea but I would defer it to the new optimizer. Probably as a separate feature(s) as we also need corresponding support in SBE. |
| Comment by David Percy [ 06/Jan/22 ] |
|
This is kind of similar to streaming group, but doesn't run into the same problem where "$group equality" means something different than "$sort equality". |