Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-32430

DocumentSourceSort sorts array documents incorrectly if there is a non-simple collation

    • Fully Compatible
    • ALL
    • v3.6, v3.4
    • Hide
      > db.letters.insert({x: "a"})
      > db.letters.insert({x: "b"})
      > db.letters.insert({x: "c"})
      > db.letters.insert({x: ["a", "b"]})
      > db.letters.insert({x: ["b", "c"]})
      > db.letters.aggregate([{$sort: {x: 1}}], {collation: {locale: "en"}})
      { "_id" : ObjectId("5a3ac7463353ac98c52f6700"), "x" : [ "a", "b" ] }
      { "_id" : ObjectId("5a3ac77a3353ac98c52f6701"), "x" : [ "b", "c" ] }
      { "_id" : ObjectId("5a3ac73c3353ac98c52f66fd"), "x" : "a" }
      { "_id" : ObjectId("5a3ac73e3353ac98c52f66fe"), "x" : "b" }
      { "_id" : ObjectId("5a3ac7403353ac98c52f66ff"), "x" : "c" }
      

      The document with x: ["b", "c"] is obviously incorrectly sorted. The find command gets the sort order correctly:

      > db.letters.find().sort({x: 1}).collation({locale: "en"})
      { "_id" : ObjectId("5a3ac73c3353ac98c52f66fd"), "x" : "a" }
      { "_id" : ObjectId("5a3ac7463353ac98c52f6700"), "x" : [ "a", "b" ] }
      { "_id" : ObjectId("5a3ac73e3353ac98c52f66fe"), "x" : "b" }
      { "_id" : ObjectId("5a3ac77a3353ac98c52f6701"), "x" : [ "b", "c" ] }
      { "_id" : ObjectId("5a3ac7403353ac98c52f66ff"), "x" : "c" }
      
      Show
      > db.letters.insert({x: "a" }) > db.letters.insert({x: "b" }) > db.letters.insert({x: "c" }) > db.letters.insert({x: [ "a" , "b" ]}) > db.letters.insert({x: [ "b" , "c" ]}) > db.letters.aggregate([{$sort: {x: 1}}], {collation: {locale: "en" }}) { "_id" : ObjectId( "5a3ac7463353ac98c52f6700" ), "x" : [ "a" , "b" ] } { "_id" : ObjectId( "5a3ac77a3353ac98c52f6701" ), "x" : [ "b" , "c" ] } { "_id" : ObjectId( "5a3ac73c3353ac98c52f66fd" ), "x" : "a" } { "_id" : ObjectId( "5a3ac73e3353ac98c52f66fe" ), "x" : "b" } { "_id" : ObjectId( "5a3ac7403353ac98c52f66ff" ), "x" : "c" } The document with x: ["b", "c"] is obviously incorrectly sorted. The find command gets the sort order correctly: > db.letters.find().sort({x: 1}).collation({locale: "en" }) { "_id" : ObjectId( "5a3ac73c3353ac98c52f66fd" ), "x" : "a" } { "_id" : ObjectId( "5a3ac7463353ac98c52f6700" ), "x" : [ "a" , "b" ] } { "_id" : ObjectId( "5a3ac73e3353ac98c52f66fe" ), "x" : "b" } { "_id" : ObjectId( "5a3ac77a3353ac98c52f6701" ), "x" : [ "b" , "c" ] } { "_id" : ObjectId( "5a3ac7403353ac98c52f66ff" ), "x" : "c" }
    • Query 2018-01-01

      Let's say we are performing an in-memory sort with the $sort aggregation stage, and the sort involves a non-simple collation. This is what happens in DocumentSourceSort:

      1. We create a Sorter that uses a Comparator taken from the ExpressionContext. This comparator is collation-aware.
      2. While doing work, we encounter a document with an array. We use the SortKeyGenerator to generate the sort key. Because the collator is non-simple, the value is mapped to its ICU comparison key.
      3. When we are done loading documents into the Sorter, we perform a stable sort. Because we are sorting ICU comparison keys, we should be using binary comparisons, but instead we are using the collation-aware comparator from the ExpressionContext. The sorted output we get is then meaningless.

            Assignee:
            kyle.suarez@mongodb.com Kyle Suarez
            Reporter:
            kyle.suarez@mongodb.com Kyle Suarez
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: