Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-32430

DocumentSourceSort sorts array documents incorrectly if there is a non-simple collation

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 3.6.1
    • Fix Version/s: 3.6.2, 3.7.1
    • Component/s: Aggregation Framework
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.6, v3.4
    • Steps To Reproduce:
      Hide

      > db.letters.insert({x: "a"})
      > db.letters.insert({x: "b"})
      > db.letters.insert({x: "c"})
      > db.letters.insert({x: ["a", "b"]})
      > db.letters.insert({x: ["b", "c"]})
      > db.letters.aggregate([{$sort: {x: 1}}], {collation: {locale: "en"}})
      { "_id" : ObjectId("5a3ac7463353ac98c52f6700"), "x" : [ "a", "b" ] }
      { "_id" : ObjectId("5a3ac77a3353ac98c52f6701"), "x" : [ "b", "c" ] }
      { "_id" : ObjectId("5a3ac73c3353ac98c52f66fd"), "x" : "a" }
      { "_id" : ObjectId("5a3ac73e3353ac98c52f66fe"), "x" : "b" }
      { "_id" : ObjectId("5a3ac7403353ac98c52f66ff"), "x" : "c" }
      

      The document with x: ["b", "c"] is obviously incorrectly sorted. The find command gets the sort order correctly:

      > db.letters.find().sort({x: 1}).collation({locale: "en"})
      { "_id" : ObjectId("5a3ac73c3353ac98c52f66fd"), "x" : "a" }
      { "_id" : ObjectId("5a3ac7463353ac98c52f6700"), "x" : [ "a", "b" ] }
      { "_id" : ObjectId("5a3ac73e3353ac98c52f66fe"), "x" : "b" }
      { "_id" : ObjectId("5a3ac77a3353ac98c52f6701"), "x" : [ "b", "c" ] }
      { "_id" : ObjectId("5a3ac7403353ac98c52f66ff"), "x" : "c" }
      

      Show
      > db.letters.insert({x: "a" }) > db.letters.insert({x: "b" }) > db.letters.insert({x: "c" }) > db.letters.insert({x: [ "a" , "b" ]}) > db.letters.insert({x: [ "b" , "c" ]}) > db.letters.aggregate([{$sort: {x: 1}}], {collation: {locale: "en" }}) { "_id" : ObjectId( "5a3ac7463353ac98c52f6700" ), "x" : [ "a" , "b" ] } { "_id" : ObjectId( "5a3ac77a3353ac98c52f6701" ), "x" : [ "b" , "c" ] } { "_id" : ObjectId( "5a3ac73c3353ac98c52f66fd" ), "x" : "a" } { "_id" : ObjectId( "5a3ac73e3353ac98c52f66fe" ), "x" : "b" } { "_id" : ObjectId( "5a3ac7403353ac98c52f66ff" ), "x" : "c" } The document with x: ["b", "c"] is obviously incorrectly sorted. The find command gets the sort order correctly: > db.letters.find().sort({x: 1}).collation({locale: "en" }) { "_id" : ObjectId( "5a3ac73c3353ac98c52f66fd" ), "x" : "a" } { "_id" : ObjectId( "5a3ac7463353ac98c52f6700" ), "x" : [ "a" , "b" ] } { "_id" : ObjectId( "5a3ac73e3353ac98c52f66fe" ), "x" : "b" } { "_id" : ObjectId( "5a3ac77a3353ac98c52f6701" ), "x" : [ "b" , "c" ] } { "_id" : ObjectId( "5a3ac7403353ac98c52f66ff" ), "x" : "c" }
    • Sprint:
      Query 2018-01-01

      Description

      Let's say we are performing an in-memory sort with the $sort aggregation stage, and the sort involves a non-simple collation. This is what happens in DocumentSourceSort:

      1. We create a Sorter that uses a Comparator taken from the ExpressionContext. This comparator is collation-aware.
      2. While doing work, we encounter a document with an array. We use the SortKeyGenerator to generate the sort key. Because the collator is non-simple, the value is mapped to its ICU comparison key.
      3. When we are done loading documents into the Sorter, we perform a stable sort. Because we are sorting ICU comparison keys, we should be using binary comparisons, but instead we are using the collation-aware comparator from the ExpressionContext. The sorted output we get is then meaningless.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: