[SERVER-85424] $group relies on stable sorting Created: 18/Jan/24 Updated: 19/Jan/24 Resolved: 19/Jan/24 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Brad Cater | Assignee: | David Percy |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Steps To Reproduce: | > db.coll.insertMany([ }} <---- This should be {"f": 20} because we sorted by [b, c].}} |
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
We would like to use std::sort (unstable sorting) instead of std::stable_sort in sorter.cpp because of the performance gains. While experimenting with this change, I found that $group with $first relies on stable sorting. With unstable sorting, the behavior does not match the documentation at https://www.mongodb.com/docs/manual/reference/operator/aggregation/first/#behaviors, specifically > To define the document order for $first in other pipeline stages, add a preceding $sort stage. The sorter change is in https://github.com/10gen/mongo/pull/17859. cc david.percy@mongodb.com This may be related to SERVER-85337. |
| Comments |
| Comment by Brad Cater [ 19/Jan/24 ] | |||||||||||||||||||||||||||
|
Closing this because the suggested repro demonstrates the intended behavior, not a bug. | |||||||||||||||||||||||||||
| Comment by David Percy [ 19/Jan/24 ] | |||||||||||||||||||||||||||
|
Stable vs unstable sorting only affects how ties are handled, and in this example there are no ties (every document has a unique combination of {b, c} values). So I don't think this sorter change could have affected this example. $group+$first does rely on "predictable ordering" but not "stable sorting". $first picks whichever input comes first, so its output is only predictable if the input order is predictable. $sort is unpredictable when there are ties, because different query plans / different indexes may break ties differently.
The behavior in the example looks correct to me. The {a: 2} group contains these docs:
so the only choices for 'c' are 30 or 40. $first picks the {b: 10, c: 30} document because it sorts earlier by {b:1, c:1}.
To double check I ran the example on a branch with no sorter changes:
|