Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 1.6.3
    • Fix Version/s: 3.3.11
    • Component/s: Indexing, Querying
    • Labels:

      Description

      Issue Status as of August 23, 2016

      ISSUE SUMMARY

      Version 3.3.11 of MongoDB introduces support for unicode-aware string comparisons, allowing users to issue queries that sort and match UTF-8 encoded string data in a locale-aware fashion. The server will accept a collation document specifying the locale, amongst other properties of the string comparator, such as diacritic sensitivity and case sensitivity. The collation can be attached at the operation level to a particular query. Alternatively, a default collation can be specified at collection creation time which will be used by all operations over the collection.

      TECHNICAL DETAILS

      Syntax for specifying a collation

      The collation is specified with a document of the following form:

      collation: {
          locale: <string>,
          caseLevel: <bool>,
          caseFirst: <string>,
          strength: <int>,
          numericOrdering: <bool>,
          alternate: <string>,
          maxVariable: <string>,
          normalization: <bool>,
          backwards: <bool>
      }
      

      All fields are optional, except for the locale field, which is required. The list of supported locales as well as documentation of all collation options is available here: Development Series 3.3.x Collation.

      Supported operations

      A collation can be attached at the operation level to the following commands:

      • aggregate
      • count
      • distinct
      • find
      • findAndModify
      • geoNear
      • group
      • mapReduce
      • remove
      • update

      If the collation is omitted, then the collection's default collation will be used.

      An operation with a collation will use the collation for all string comparisons of stored data. If, for example, an aggregation is issued with a $match stage followed by a $sort stage with the diacritic-insensitive French collation, then the server will apply the diacritic-insensitive French semantics to both the match and the sort.

      Index support

      A collation can also be associated with an index at index creation time. Indexes with a collation can support string matching and string sorting operations if the collation associated with the index is identical to the index associated with the query. The following index types accept a collation at index build time:

      • btree
      • 2dsphere

      Index builds issued against a collection with a default collation will inherit the collection default unless an overriding collation is specified explicitly on the createIndex command.

      Example

      The following example demonstrates how to use the mongo shell to sort strings using French Canadian comparison rules:

      > db.myColl.insert([{_id: 1, "term": "cote"}, {_id: 2, "term": "coté"}, {_id: 3, "term" : "côte"}, {_id: 4, "term" : "côté"}]);
      > db.myColl.find().sort({"term": -1}).collation({"locale": "fr_CA"});
      { "_id" : 4, "term" : "côté" }
      { "_id" : 2, "term" : "coté" }
      { "_id" : 3, "term" : "côte" }
      { "_id" : 1, "term" : "cote" }
      

      Note that the order in which the result set is sorted would be different without the .collation() modifier, as the fr_CA locale includes the backwards option by default, enabling special French comparison rules for diacritical marks.

      More details

      For more thorough technical documentation, please refer to the documentation.

      IMPACT ON DOWNGRADE

      Downgrade from 3.4 to 3.2 is illegal if the data files contain any collections or indices with a collation. Before downgrading, all collections and indices with an associated collation must be dropped.

      FURTHER INFORMATION

      Documentation for this feature is available in the 3.3.x development series release notes. To join our beta program for Collation Support in MongoDB, and suggest improvements to our implementation, please email beta@mongodb.com.

      Original description

      I need to properly mongodb sorting characters that are in the wrong order when sorting in utf-8. MySQL has an option to "collation" by which we can set that properly were also ordered list of results by the Polish characters, eg: by utf8_polish_ci

        Issue Links

          Activity

          Hide
          mnmldr Nikita Dedik added a comment -

          Very sad that there's no reaction from MongoDB developers on such an important issue for YEARS.

          Show
          mnmldr Nikita Dedik added a comment - Very sad that there's no reaction from MongoDB developers on such an important issue for YEARS.
          Hide
          pacohernandezg Paco Hernández added a comment - - edited

          I have seen that WiredTiger has an interface that allows to provide custom ordering:
          http://source.wiredtiger.com/2.3.1/struct_w_t___c_o_l_l_a_t_o_r.html

          https://github.com/mongodb/mongo/blob/de54755e568481d1bdef37339d899403e3b04d86/src/mongo/db/storage/wiredtiger/wiredtiger_index.cpp

          Would be possible to implement a patch for the new WiredTiger engine in MongoDB 2.8?

          Thank you.

          Show
          pacohernandezg Paco Hernández added a comment - - edited I have seen that WiredTiger has an interface that allows to provide custom ordering: http://source.wiredtiger.com/2.3.1/struct_w_t___c_o_l_l_a_t_o_r.html https://github.com/mongodb/mongo/blob/de54755e568481d1bdef37339d899403e3b04d86/src/mongo/db/storage/wiredtiger/wiredtiger_index.cpp Would be possible to implement a patch for the new WiredTiger engine in MongoDB 2.8? Thank you.
          Hide
          chris@hirtfamily.net Chris Hirt added a comment -

          +1 Just like to say this issue is really important for us! Wish I had known before we chose Mongo! Storing the sort key as others have mentioned is our strategy going forward, although I haven't worked out yet how long that will take to implement. Our server-side application is written in PHP, and so I found this blog post on persisting sort keys helpful. http://derickrethans.nl/mongodb-collation.html

          Show
          chris@hirtfamily.net Chris Hirt added a comment - +1 Just like to say this issue is really important for us! Wish I had known before we chose Mongo! Storing the sort key as others have mentioned is our strategy going forward, although I haven't worked out yet how long that will take to implement. Our server-side application is written in PHP, and so I found this blog post on persisting sort keys helpful. http://derickrethans.nl/mongodb-collation.html
          Hide
          pwilkin Piotr Wilkin added a comment -

          Could we please get some status updates on this? For many non-English speaking users, this is a pretty important issue.

          Show
          pwilkin Piotr Wilkin added a comment - Could we please get some status updates on this? For many non-English speaking users, this is a pretty important issue.
          Hide
          hedefalk Viktor Hedefalk added a comment -

          Yey! Seeing this is resolved makes me super happy! Great!

          Show
          hedefalk Viktor Hedefalk added a comment - Yey! Seeing this is resolved makes me super happy! Great!

            Dates

            • Created:
              Updated:
              Resolved: