[SERVER-85424] $group relies on stable sorting Created: 18/Jan/24  Updated: 19/Jan/24  Resolved: 19/Jan/24

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Brad Cater Assignee: David Percy
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-81571 Reconsider stable sort in sorter.cpp Blocked
Related
is related to SERVER-85337 $bucketAuto $first does not obey inpu... In Code Review
Operating System: ALL
Steps To Reproduce:

> db.coll.insertMany([
  {a: 1, b: 2, c: 3},
  {a: 1, b: 2, c: 4},
  {a: 1, b: 3, c: 4},
  {a: 1, b: 10, c: 20},
  {a: 2, b: 20, c: 30},
  {a: 2, b: 30, c: 40},
  {a: 2, b: 10, c: 30}
])
{{{}}
        "acknowledged" : true,
        "insertedIds" : [
                ObjectId("65a9a670f6bb772ae439c829"),
                ObjectId("65a9a670f6bb772ae439c82a"),
                ObjectId("65a9a670f6bb772ae439c82b"),
                ObjectId("65a9a670f6bb772ae439c82c"),
                ObjectId("65a9a670f6bb772ae439c82d"),
                ObjectId("65a9a670f6bb772ae439c82e"),
                ObjectId("65a9a670f6bb772ae439c82f")
        ]
}
> db.coll.aggregate([
  {$sort: {b: 1, c: 1,}}
  {$group: {_id: "$a", f: {$first: "$c"}}}
])
{{

{ "_id" : 1, "f" : 3 }

}}
{{

{ "_id" : 2, "f" : 30 }

<---- This should be {"f": 20} because we sorted by [b, c].}}

Participants:

 Description   

We would like to use std::sort (unstable sorting) instead of std::stable_sort in sorter.cpp because of the performance gains. While experimenting with this change, I found that $group with $first relies on stable sorting. With unstable sorting, the behavior does not match the documentation at https://www.mongodb.com/docs/manual/reference/operator/aggregation/first/#behaviors, specifically

> To define the document order for $first in other pipeline stages, add a preceding $sort stage.

The sorter change is in https://github.com/10gen/mongo/pull/17859.

cc david.percy@mongodb.com This may be related to SERVER-85337.



 Comments   
Comment by Brad Cater [ 19/Jan/24 ]

Closing this because the suggested repro demonstrates the intended behavior, not a bug.

Comment by David Percy [ 19/Jan/24 ]

Stable vs unstable sorting only affects how ties are handled, and in this example there are no ties (every document has a unique combination of {b, c} values).  So I don't think this sorter change could have affected this example.

$group+$first does rely on "predictable ordering" but not "stable sorting".  $first picks whichever input comes first, so its output is only predictable if the input order is predictable.  $sort is unpredictable when there are ties, because different query plans / different indexes may break ties differently.

 


The behavior in the example looks correct to me. The {a: 2} group contains these docs:

  {a: 2, b: 20, c: 30},
  {a: 2, b: 30, c: 40},
  {a: 2, b: 10, c: 30}

so the only choices for 'c' are 30 or 40.  $first picks the {b: 10, c: 30} document because it sorts earlier by {b:1, c:1}.

 

To double check I ran the example on a branch with no sorter changes:

> db.coll.insertMany([
...   {a: 1, b: 2, c: 3},
...   {a: 1, b: 2, c: 4},
...   {a: 1, b: 3, c: 4},
...   {a: 1, b: 10, c: 20},
...   {a: 2, b: 20, c: 30},
...   {a: 2, b: 30, c: 40},
...   {a: 2, b: 10, c: 30}
... ])
{
        "acknowledged" : true,
        "insertedIds" : [
                ObjectId("65a9b708dd08f28938a21b8a"),
                ObjectId("65a9b708dd08f28938a21b8b"),
                ObjectId("65a9b708dd08f28938a21b8c"),
                ObjectId("65a9b708dd08f28938a21b8d"),
                ObjectId("65a9b708dd08f28938a21b8e"),
                ObjectId("65a9b708dd08f28938a21b8f"),
                ObjectId("65a9b708dd08f28938a21b90")
        ]
}
> db.coll.aggregate([   {$sort: {b: 1, c: 1,}},   {$group: {_id: "$a", f: {$first: "$c"}}} ])
{ "_id" : 1, "f" : 3 }
{ "_id" : 2, "f" : 30 } 

Generated at Thu Feb 08 06:57:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.