Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 1.6.3
    • Fix Version/s: 3.3.11
    • Component/s: Indexing, Querying
    • Labels:

      Description

      Issue Status as of August 23, 2016

      ISSUE SUMMARY

      Version 3.3.11 of MongoDB introduces support for unicode-aware string comparisons, allowing users to issue queries that sort and match UTF-8 encoded string data in a locale-aware fashion. The server will accept a collation document specifying the locale, amongst other properties of the string comparator, such as diacritic sensitivity and case sensitivity. The collation can be attached at the operation level to a particular query. Alternatively, a default collation can be specified at collection creation time which will be used by all operations over the collection.

      TECHNICAL DETAILS

      Syntax for specifying a collation

      The collation is specified with a document of the following form:

      collation: {
          locale: <string>,
          caseLevel: <bool>,
          caseFirst: <string>,
          strength: <int>,
          numericOrdering: <bool>,
          alternate: <string>,
          maxVariable: <string>,
          normalization: <bool>,
          backwards: <bool>
      }
      

      All fields are optional, except for the locale field, which is required. The list of supported locales as well as documentation of all collation options is available here: Development Series 3.3.x Collation.

      Supported operations

      A collation can be attached at the operation level to the following commands:

      • aggregate
      • count
      • distinct
      • find
      • findAndModify
      • geoNear
      • group
      • mapReduce
      • remove
      • update

      If the collation is omitted, then the collection's default collation will be used.

      An operation with a collation will use the collation for all string comparisons of stored data. If, for example, an aggregation is issued with a $match stage followed by a $sort stage with the diacritic-insensitive French collation, then the server will apply the diacritic-insensitive French semantics to both the match and the sort.

      Index support

      A collation can also be associated with an index at index creation time. Indexes with a collation can support string matching and string sorting operations if the collation associated with the index is identical to the index associated with the query. The following index types accept a collation at index build time:

      • btree
      • 2dsphere

      Index builds issued against a collection with a default collation will inherit the collection default unless an overriding collation is specified explicitly on the createIndex command.

      Example

      The following example demonstrates how to use the mongo shell to sort strings using French Canadian comparison rules:

      > db.myColl.insert([{_id: 1, "term": "cote"}, {_id: 2, "term": "coté"}, {_id: 3, "term" : "côte"}, {_id: 4, "term" : "côté"}]);
      > db.myColl.find().sort({"term": -1}).collation({"locale": "fr_CA"});
      { "_id" : 4, "term" : "côté" }
      { "_id" : 2, "term" : "coté" }
      { "_id" : 3, "term" : "côte" }
      { "_id" : 1, "term" : "cote" }
      

      Note that the order in which the result set is sorted would be different without the .collation() modifier, as the fr_CA locale includes the backwards option by default, enabling special French comparison rules for diacritical marks.

      More details

      For more thorough technical documentation, please refer to the documentation.

      IMPACT ON DOWNGRADE

      Downgrade from 3.4 to 3.2 is illegal if the data files contain any collections or indices with a collation. Before downgrading, all collections and indices with an associated collation must be dropped.

      FURTHER INFORMATION

      Documentation for this feature is available in the 3.3.x development series release notes. To join our beta program for Collation Support in MongoDB, and suggest improvements to our implementation, please email beta@mongodb.com.

      Original description

      I need to properly mongodb sorting characters that are in the wrong order when sorting in utf-8. MySQL has an option to "collation" by which we can set that properly were also ordered list of results by the Polish characters, eg: by utf8_polish_ci

        Attachments

          Issue Links

            Activity

              Dates

              • Created:
                Updated:
                Resolved: