[SERVER-32430] DocumentSourceSort sorts array documents incorrectly if there is a non-simple collation Created: 20/Dec/17  Updated: 30/Oct/23  Resolved: 22/Dec/17

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.6.1
Fix Version/s: 3.6.2, 3.7.1

Type: Bug Priority: Critical - P2
Reporter: Kyle Suarez Assignee: Kyle Suarez
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-32956 Unblacklist aggregation/sources/sort/... Closed
Duplicate
is duplicated by SERVER-32297 Aggregations that merge on mongos do ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6, v3.4
Steps To Reproduce:

> db.letters.insert({x: "a"})
> db.letters.insert({x: "b"})
> db.letters.insert({x: "c"})
> db.letters.insert({x: ["a", "b"]})
> db.letters.insert({x: ["b", "c"]})
> db.letters.aggregate([{$sort: {x: 1}}], {collation: {locale: "en"}})
{ "_id" : ObjectId("5a3ac7463353ac98c52f6700"), "x" : [ "a", "b" ] }
{ "_id" : ObjectId("5a3ac77a3353ac98c52f6701"), "x" : [ "b", "c" ] }
{ "_id" : ObjectId("5a3ac73c3353ac98c52f66fd"), "x" : "a" }
{ "_id" : ObjectId("5a3ac73e3353ac98c52f66fe"), "x" : "b" }
{ "_id" : ObjectId("5a3ac7403353ac98c52f66ff"), "x" : "c" }

The document with x: ["b", "c"] is obviously incorrectly sorted. The find command gets the sort order correctly:

> db.letters.find().sort({x: 1}).collation({locale: "en"})
{ "_id" : ObjectId("5a3ac73c3353ac98c52f66fd"), "x" : "a" }
{ "_id" : ObjectId("5a3ac7463353ac98c52f6700"), "x" : [ "a", "b" ] }
{ "_id" : ObjectId("5a3ac73e3353ac98c52f66fe"), "x" : "b" }
{ "_id" : ObjectId("5a3ac77a3353ac98c52f6701"), "x" : [ "b", "c" ] }
{ "_id" : ObjectId("5a3ac7403353ac98c52f66ff"), "x" : "c" }

Sprint: Query 2018-01-01
Participants:

 Description   

Let's say we are performing an in-memory sort with the $sort aggregation stage, and the sort involves a non-simple collation. This is what happens in DocumentSourceSort:

  1. We create a Sorter that uses a Comparator taken from the ExpressionContext. This comparator is collation-aware.
  2. While doing work, we encounter a document with an array. We use the SortKeyGenerator to generate the sort key. Because the collator is non-simple, the value is mapped to its ICU comparison key.
  3. When we are done loading documents into the Sorter, we perform a stable sort. Because we are sorting ICU comparison keys, we should be using binary comparisons, but instead we are using the collation-aware comparator from the ExpressionContext. The sorted output we get is then meaningless.


 Comments   
Comment by Kyle Suarez [ 29/Jan/18 ]

Filed and linked SERVER-32956.

Comment by Kelsey Schubert [ 29/Jan/18 ]

New ticket, please.

Comment by Kyle Suarez [ 29/Jan/18 ]

Oops. Is it possible to still throw in a commit under this ticket or should we just file a new one?

Comment by Charlie Swanson [ 26/Jan/18 ]

Looks like we forgot to remove this when we linked SERVER-32297 as a duplicate: https://github.com/mongodb/mongo/blob/0aeb5ce7e8d4a190dac43fd110533eef149f7880/buildscripts/resmokeconfig/suites/aggregation_sharded_collections_passthrough.yml#L41

Comment by Githook User [ 02/Jan/18 ]

Author:

{'name': 'Kyle Suarez', 'username': 'ksuarz', 'email': 'kyle.suarez@mongodb.com'}

Message: SERVER-32297, SERVER-32430 fix $sort in-memory sort and $sortKey serialization

(cherry picked from commit 79352e71b697cb8c126510095bba7fd816128701)
Branch: v3.6
https://github.com/mongodb/mongo/commit/5bb6e3cb98e404a55d5edba802d24559eca20d6b

Comment by Githook User [ 22/Dec/17 ]

Author:

{'name': 'Kyle Suarez', 'email': 'kyle.suarez@mongodb.com', 'username': 'ksuarz'}

Message: SERVER-32297, SERVER-32430 fix $sort in-memory sort and $sortKey serialization
Branch: master
https://github.com/mongodb/mongo/commit/79352e71b697cb8c126510095bba7fd816128701

Comment by Kyle Suarez [ 20/Dec/17 ]

I'm marking this as 3.6 Required because it's a query correctness bug.

Generated at Thu Feb 08 04:30:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.