[SERVER-24518] MERGE_SORT_STAGE can be used more aggressively when OR_STAGE index sort orders match Created: 10/Jun/16  Updated: 31/Oct/23

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Hatch Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: neweng, qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-27658 $or should choose SORT_MERGE instead ... Closed
Related
related to SERVER-41142 Compound index on an array field is n... Backlog
Assigned Teams:
Query Optimization
Sprint: Query 2020-04-06, Query 2020-04-20, Query 2020-05-04, Query 2020-05-18
Participants:

 Description   

Currently, we only replace an OR_STAGE with a MERGE_SORT_STAGE when the sort order of the underlying indexes are shared and match the desired sort order of the query.

The OR_STAGE uses unbounded memory to de-duplicate records. The MERGE_SORT_STAGE currently seems to do the same, but could be optimized to use far less memory than the OR_STAGE since duplicate records are coming in from child stages in the same order.

Given that MERGE_SORT_STAGE can be optimized to avoid unbounded memory, we should consider aggressively using it instead of OR_STAGEs, and re-SORTing afterwards (which happens in the OR_STAGE case as well).



 Comments   
Comment by David Storch [ 10/Jun/16 ]

Right, good point, I think we would have to clear MergeSortStage::_seen every time the value of the sort key of the documents being merged changes.

Comment by David Hatch [ 10/Jun/16 ]

Yeah, and I think we'd be able to reduce memory usage in the MERGE_SORT stage, right?

Comment by David Storch [ 10/Jun/16 ]

david.hatch, are you saying that we should use a MERGE_SORT stage when possible, even if the sort obtained by the MERGE_SORT does not satisfy the user requested sort (and hence a subsequent SORT stage would be required) and even if the user did not request a sort? If I understand correctly, it's a very interesting idea

Generated at Thu Feb 08 04:06:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.