Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-24621

Make find command min/max options respect the collation



    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.11
    • Component/s: Querying
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
    • Sprint:
      Query 18 (08/05/16)


      If the cursor.min() or cursor.max() options are supplied to the find command with string boundary values, then the string comparisons used to define the boundary should be collation-aware. Consider the following query:

      db.collection.find().min({str: "x"}).collation({locale: "en_US", strength: 2});

      This query respects the case-insensitive US English collation, and therefore strings beginning with both lowercase "x" and uppercase "X" should match. Similarly, min or max queries where the boundary is a nested array or object must respect the collation. For example,

      db.collection.find().min({str: {val: "x"}}).collation({locale: "en_US", strength: 2});

      should match documents where str is either {val: "x"} or {val: "X"}.

      Since min and max require a compatible index, this query will only succeed if there is an index on the collection with the key pattern {str: 1} that has the case-insensitive US English collation. If, however, the min/max boundary values do not include string, nested object, or nested array values, then the index collation is not required to match the query collation. For example, the query

      db.collection.find().min({a: 1, b: 1}).max({a: 5, b: 5}).collation({locale: "en_US", strength: 2});

      can use any index with the key pattern {a: 1, b: 1}. The index is not required to have the case-insensitive US English collation in order to be compatible.

      Original description

      cursor.min and cursor.max both accept an index key parameter. For expression indexes, we don't translate the index key provided by the query through the normal index key generation path.

      This currently results in the behavior below for hashed indexes, and other expression indexes.

      > db.test.ensureIndex({ "a" : "hashed" })
              "createdCollectionAutomatically" : true,
              "numIndexesBefore" : 1,
              "numIndexesAfter" : 2,
              "ok" : 1
      > db.test.insert({ a: "a" })
      WriteResult({ "nInserted" : 1 })
      > db.test.find().min({a: "a"}).max({a: "b"})
      > db.test.find().hint({a: "hashed"}).returnKey()
      { "a" : NumberLong("2780795045148116090") }
      > db.test.find().min({a: NumberLong("2780795045148116090") }).max({a: NumberLong("2780795045148116091") })
      { "_id" : ObjectId("57630a3664248634c5bf5f27"), "a" : "a" }

      The index key used by min/max corresponds exactly with the key stored in the index.

      For collation-aware indexes, min/max key queries will need to use ICU-translated comparison keys in order to yield any meaningful results. This is likely non-obvious to users.

      Additionally, min and max are used internally by sharding, so it is unlikely that changing behavior to translate these keys will be straightforward.

      In combination with collation, the behavior of min/max index selection, bounds generation and sorting is confusing.

      > db.coll.ensureIndex({a: 1, b: 1})
      > db.coll.insert({a: 1, b: "a"}, {a: 1, b: "A"})
      > db.coll.find({}, {_id: 0})
                   .min({a: 1, b: "A"})
                   .max({a: 1, b: "b"})
                   .collation({locale: "en_US", strength: 1})
                   .sort({a: 1, b: 1})
      { "a" : 1, "b" : "a" }

      Since the requested collation is case-insensitive (strength: 1), the user may expect both objects containing "a" and "A" to be returned. However, min/max only returns "a" since in the index ordering (according to simple binary comparison) "A" < "a").




            david.storch David Storch
            david.hatch David Hatch
            0 Vote for this issue
            3 Start watching this issue