[SERVER-24518] MERGE_SORT_STAGE can be used more aggressively when OR_STAGE index sort orders match Created: 10/Jun/16 Updated: 31/Oct/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | David Hatch | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | neweng, qexec-team | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||
| Sprint: | Query 2020-04-06, Query 2020-04-20, Query 2020-05-04, Query 2020-05-18 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Currently, we only replace an OR_STAGE with a MERGE_SORT_STAGE when the sort order of the underlying indexes are shared and match the desired sort order of the query. The OR_STAGE uses unbounded memory to de-duplicate records. The MERGE_SORT_STAGE currently seems to do the same, but could be optimized to use far less memory than the OR_STAGE since duplicate records are coming in from child stages in the same order. Given that MERGE_SORT_STAGE can be optimized to avoid unbounded memory, we should consider aggressively using it instead of OR_STAGEs, and re-SORTing afterwards (which happens in the OR_STAGE case as well). |
| Comments |
| Comment by David Storch [ 10/Jun/16 ] |
|
Right, good point, I think we would have to clear MergeSortStage::_seen every time the value of the sort key of the documents being merged changes. |
| Comment by David Hatch [ 10/Jun/16 ] |
|
Yeah, and I think we'd be able to reduce memory usage in the MERGE_SORT stage, right? |
| Comment by David Storch [ 10/Jun/16 ] |
|
david.hatch, are you saying that we should use a MERGE_SORT stage when possible, even if the sort obtained by the MERGE_SORT does not satisfy the user requested sort (and hence a subsequent SORT stage would be required) and even if the user did not request a sort? If I understand correctly, it's a very interesting idea |