[SERVER-62405] Partially-streaming compound $sort Created: 06/Jan/22  Updated: 05/Dec/22

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: David Percy Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-69027 [CQF] Support for Recursive Index Nav... Closed
is related to SERVER-65159 Consider {a: 1, b: 1} index to satisf... Backlog
Assigned Teams:
Query Optimization
Participants:

 Description   

In a pipeline like this:

{$sort: {x: 1}}
... // order-preserving stages
{$sort: {x: 1, y: 1}}

we could optimize the second $sort, to take advantage of the fact that the input is already sorted on {x: 1}. Any two documents with the same {x: 1} key are already next to each other, so whenever we see 'x' increase, we can start returning results before asking for more input. Depending on how big these runs of equal 'x' are, this could save a lot of space, and maybe avoid spilling.

Also, we only need to compare values of 'y', since we're only breaking ties within each run of equal 'x' values. Maybe that would be useful if 'x' is large, or the overall document is small.

This could be useful for cases like:

  • {$densify ...} {$sort: {partitionField: 1, someOtherField: 1}}, because the output of $densify will be sorted by {partitionField: 1}

    .

  • {$unpackBucket ...} {$sort: {meta: 1, temperature: 1}}, because we could push down the sort on {meta: 1}

    .

  • {$unwind ...} {$sort ...} similar to $unpackBucket.


 Comments   
Comment by David Percy [ 22/Nov/22 ]

Another way to think about this might be similar to recursive index navigation. If you have an index {a: 1}, and you want to sort by {a: 1, b: 1}, we could have a nested loop, where we get unique 'a' values and then do a separate blocking sort for each one.

This would also work for cases like {$unpackBucket ...} {$sort: {meta: 1, temperature: 1}}, assuming you have an index on {meta: 1}. Something like:

  • distinct-scan {meta: 1}
  • for each unique value:
    • unpack
    • sort by temperature

However this would not work for stages like $densify or $setWindowFields, which carry state between documents.

Comment by Pawel Terlecki [ 28/Apr/22 ]

I agree with anton.korshunov@mongodb.com. A good idea but I would defer it to the new optimizer. Probably as a separate feature(s) as we also need corresponding support in SBE.

Comment by David Percy [ 06/Jan/22 ]

This is kind of similar to streaming group, but doesn't run into the same problem where "$group equality" means something different than "$sort equality".

Generated at Thu Feb 08 05:55:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.