Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.3.11
Affects Version/s: 1.6.3
Component/s: Index Maintenance, Querying
Labels:
- query_triage

Assigned Teams:

Query
Backwards Compatibility:
Fully Compatible
Case:
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Issue Status as of August 23, 2016

ISSUE SUMMARY

Version 3.3.11 of MongoDB introduces support for unicode-aware string comparisons, allowing users to issue queries that sort and match UTF-8 encoded string data in a locale-aware fashion. The server will accept a collation document specifying the locale, amongst other properties of the string comparator, such as diacritic sensitivity and case sensitivity. The collation can be attached at the operation level to a particular query. Alternatively, a default collation can be specified at collection creation time which will be used by all operations over the collection.

TECHNICAL DETAILS

Syntax for specifying a collation

The collation is specified with a document of the following form:

collation: {
    locale: <string>,
    caseLevel: <bool>,
    caseFirst: <string>,
    strength: <int>,
    numericOrdering: <bool>,
    alternate: <string>,
    maxVariable: <string>,
    normalization: <bool>,
    backwards: <bool>
}

All fields are optional, except for the locale field, which is required. The list of supported locales as well as documentation of all collation options is available here: Development Series 3.3.x Collation.

Supported operations

A collation can be attached at the operation level to the following commands:

aggregate
count
distinct
find
findAndModify
geoNear
group
mapReduce
remove
update

If the collation is omitted, then the collection's default collation will be used.

An operation with a collation will use the collation for all string comparisons of stored data. If, for example, an aggregation is issued with a $match stage followed by a $sort stage with the diacritic-insensitive French collation, then the server will apply the diacritic-insensitive French semantics to both the match and the sort.

Index support

A collation can also be associated with an index at index creation time. Indexes with a collation can support string matching and string sorting operations if the collation associated with the index is identical to the index associated with the query. The following index types accept a collation at index build time:

btree
2dsphere

Index builds issued against a collection with a default collation will inherit the collection default unless an overriding collation is specified explicitly on the createIndex command.

Example

The following example demonstrates how to use the mongo shell to sort strings using French Canadian comparison rules:

> db.myColl.insert([{_id: 1, "term": "cote"}, {_id: 2, "term": "coté"}, {_id: 3, "term" : "côte"}, {_id: 4, "term" : "côté"}]);
> db.myColl.find().sort({"term": -1}).collation({"locale": "fr_CA"});
{ "_id" : 4, "term" : "côté" }
{ "_id" : 2, "term" : "coté" }
{ "_id" : 3, "term" : "côte" }
{ "_id" : 1, "term" : "cote" }

Note that the order in which the result set is sorted would be different without the .collation() modifier, as the fr_CA locale includes the backwards option by default, enabling special French comparison rules for diacritical marks.

More details

For more thorough technical documentation, please refer to the documentation.

IMPACT ON DOWNGRADE

Downgrade from 3.4 to 3.2 is illegal if the data files contain any collections or indices with a collation. Before downgrading, all collections and indices with an associated collation must be dropped.

FURTHER INFORMATION

Documentation for this feature is available in the 3.3.x development series release notes. To join our beta program for Collation Support in MongoDB, and suggest improvements to our implementation, please email beta@mongodb.com.

Original description

I need to properly mongodb sorting characters that are in the wrong order when sorting in utf-8. MySQL has an option to "collation" by which we can set that properly were also ordered list of results by the Polish characters, eg: by utf8_polish_ci

is depended on by

DRIVERS-291 Support providing collation per operation

Closed

SERVER-90 case insensitive index

Closed

is related to

SERVER-9367 toLowerCase() function does not work for Turkish char "İ"

Closed

related to

CXX-290 Problem with Query & hint (const string &jsonKeyPatt) with compound index in locale with comma as decimal point

Closed

Assignee:: Backlog - Query Team (Inactive)
Reporter:: ppalka
Participants:: Andres Jaimes, Backlog - Query Team, Chris Hirt, Daniel Pasette, Daniel Walter, Dieter Guendisch, Eliot Horowitz, Eric Milkie, Florian Sesser, Hamilton Vera, Harald Lapp, Ismet Ozalp, J. Cardina, John Crenshaw, liugen, Martin Flower, Miko?aj Michalczyk, Minh Nguyen, Muhammad Hussein Fattahizadeh, Nicholas Marshall, Nikita Dedik, NOVALUE Mitar, Paco Hernández, Petr Novak, Petterson Andrade, Piotr Wilkin, ppalka, Søren Boll Overgaard, Tuner, Viktor Hedefalk
Votes:: 122 Vote for this issue
Watchers:: 111 Start watching this issue

Created:: Oct 11 2010 10:51:44 AM UTC
Updated:: Dec 06 2022 05:47:46 AM UTC
Resolved:: Aug 23 2016 02:06:45 PM UTC

Details

Description

Syntax for specifying a collation

Supported operations

Index support

Example

More details

Original description

Attachments

Issue Links

Activity

People

Dates

PagerDuty