[SERVER-90] case insensitive index Created: 03/Jun/09 Updated: 07/Apr/23 Resolved: 23/Aug/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance |
| Affects Version/s: | None |
| Fix Version/s: | 3.3.11 |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Eliot Horowitz (Inactive) | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Done | Votes: | 359 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Query
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Participants: |
Arkadiy Kukarkin, Backlog - Query Team, chuong nguyen, Clifford Hammerschmidt, Colin Mollenhour, Damjan Rems, dane truelson, Daniel Pasette, Doug Mayer, Eliot Horowitz, Hendy Irawan, Idris Mokhtarzada, Ivan Fioravanti, Jesper Erik Bendtsen, Joel Sanderson, John Page, jonathan roeder, Josef Sábl, Laurentiu Macovei, Oleg Vivtash, Ovidiu Anicai, Paco Hernández, Phil Idem, Rahul B. Mattte [X], Remon van Vliet, Robert La Ferla, Roberto Germano Vieweg Neto, Sean Wolfe, Sonal Shah, Tim Hardy, Tim Hawkins, Tuner
|
||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||
| Description |
|
Issue Status as of Aug 23, 2016 ISSUE SUMMARY Case-insensitive indices are available in MongoDB 3.3.11 via the new collation feature. Collations allow queries to match and sort UTF-8 string data in a locale-aware fashion, but also allow users to configure case sensitivity. For general documentation of collation, see TECHNICAL DETAILS Case-insensitivity can be specified as a collation with a strength of 2. This strength indicates that the collator should make case-insensitive comparisons. (Strength 1, the weakest comparison level, is both case-insensitive and diacritic-insensitive. Higher strengths are more discriminating whereas lower strengths are less discriminating.) Case-insensitivity, like any collation, can be specified at the level of an operation, an index, or as a collection default. Queries or index builds with no explicit collation will inherit the collection's default collation. An index is eligible for use by case-insensitive string matching or sorting operations if its case-insensitive collation matches that of the query. ExampleThe following example shows how to create and query a case insensitive index with English case folding rules:
IMPACT ON DOWNGRADE Downgrade from 3.4 to 3.2 is illegal if the data files contain any collections with a case-insensitive default collation or any case-insensitive indices. Before downgrading, all collections and indices with an associated collation must be dropped. FURTHER INFORMATION Documentation for this feature is available in the 3.3.x development series release notes. To join our beta program for Case Insensitive Indices, and suggest improvements to our implementation, please email beta@mongodb.com. Original descriptionpotential syntax:
reminder, you can aways do this for now:
|
| Comments |
| Comment by Rahul B. Mattte [X] [ 02/Aug/16 ] | ||||||||||||||||
|
I am using below query to fetch the data in alphabetically sorted format collection.col.find( {name: somename}).sort( {name: 1}).toArray(function(error, namelist) {...}) The query returns the sorted data but in case sensitive format. { "name" : "AAA" }, { "name" : "BBB" }, { "name" : "aaa" }Can you please suggest any solution to get the result in insensitive format like below { "name" : "aaa" }, { "name" : "AAA" }, { "name" : "BBB" }Is this reported bug applicable for my issue also..please help me I am new to the mongoDB | ||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 28/Jul/16 ] | ||||||||||||||||
|
Work on this feature is close to completion and scheduled to ship with 3.4. Once it's ready in a development release we will provide documentation and we'd love for you to give it a try. | ||||||||||||||||
| Comment by Oleg Vivtash [ 28/Jul/16 ] | ||||||||||||||||
|
Is there any update on when to expect this feature? | ||||||||||||||||
| Comment by Jesper Erik Bendtsen [ 18/Mar/16 ] | ||||||||||||||||
|
Really hope this feature comes in the near future, everyone are used to search in case insensitive way, it really creates headache for developers! Example in PHP So when users save data I change it to one of the above. And when users searching I change the input to above and then do the search/sort. Know the above is not super, but maybe it will help someone until case insensitive search comes, really hopes it comes!? This ticket is from 2009, it worries me! Please reply with future plans? | ||||||||||||||||
| Comment by Roberto Germano Vieweg Neto [ 09/Mar/16 ] | ||||||||||||||||
|
Another good feature can also be pass a compare function for the sort method, like this:
Sometimes you need to sort little data, and you doesn't want/need to create an index to do that. | ||||||||||||||||
| Comment by Tim Hardy [ 09/Feb/16 ] | ||||||||||||||||
|
For the love of all that is good, PLEASE add case insensitive indexes! Pretty much all web development involves the ability to search by string, and the standard is case insensitive. When you type a search into Amazon, do you expect your search terms to be case sensitive? No, you don't. If you're a web developer and use MongoDb, you have to bend over backwards to provide this incredibly common capability. How Mongo can style itself as an enterprise option nowadays and not support this is beyond me. | ||||||||||||||||
| Comment by Damjan Rems [ 04/Feb/16 ] | ||||||||||||||||
|
I also become more and more dependend on alternative sorting. I have been thinking about the issue and as I remember I have read somewhere that it was so important to MongoDB team to have fast sorting algorithm that they didn't want to complicate sorting process. So lets save data in sorting indexes the way that doesn't change sorting process. My proposal is to replace charachters (binary code) in String keys just before it is saved to index key, with characters provided by collation translation tabels. This also means that find algorithm should do the same operation before index searching is done. The other solution would be to replace characters before they are saved to document. As of MongoDB 3.2 encription is available, which does more or less the same thing. Mongo team. Please we are waiting alternative sorting for too long. by | ||||||||||||||||
| Comment by dane truelson [ 28/Dec/15 ] | ||||||||||||||||
|
Hello guys... This is a very important feature for almost any application that deals with non-numeric search. It does not make sense that I have to store two copies of every person's name just to get a search that is fast enough. | ||||||||||||||||
| Comment by Josef Sábl [ 30/Sep/15 ] | ||||||||||||||||
|
It is very frustrating that this obvious thing is still unresolved. We are actually dumping MongoDB for Elastic Search for our search needs. Congrats | ||||||||||||||||
| Comment by chuong nguyen [ 28/Jul/15 ] | ||||||||||||||||
|
Please help me sort with char uppercase and lowercase | ||||||||||||||||
| Comment by Tuner [ 09/Sep/14 ] | ||||||||||||||||
|
Hope it helps guys, http://derickrethans.nl/mongodb-collation.html | ||||||||||||||||
| Comment by Laurentiu Macovei [ 09/Sep/14 ] | ||||||||||||||||
|
This is a must have in terms of easing very common basic usage. Too bad in 4 years nothing changed | ||||||||||||||||
| Comment by John Page [ 20/Aug/14 ] | ||||||||||||||||
|
I believe are a couple of workarounds to this one of which is to do with the correct pay to page (i.e. never using $skip as many seem to). For a simple case insensitive search create a full text search index on the field, when querying query on the full text search index AND a case insensitive regex on the field to ensure no false positives. When paging - the important thing is to query every time for values greater than the last on the page of results - limit X where X is the page size. Sort and skip has you walking through the results or the index taking longer for each page - search > last value and sort is better (assuming an index to ensure you get them in order) If you use a case Sensitive index for this you do two searches limit pagesize with the first letter changed case then sort the results on display - so if the last thing on the page was 'john.page@mongodb.com' Query for >= 'john.page@mongodb.com' limit 10 Then sort those results and show the first 10. Not perfect but still quite fast. You can only do this to page through results from the start - if you want to jump to a specific page it's a lot harder, but if you are indexing on other fields as well possibly not that hard. | ||||||||||||||||
| Comment by Hendy Irawan [ 05/Jul/14 ] | ||||||||||||||||
|
+1. tshawkins now that you mentioned it, yes we need SERVER-153. but IMHO this one is more priority due to extremely common usage. | ||||||||||||||||
| Comment by Ivan Fioravanti [ 11/Jun/14 ] | ||||||||||||||||
|
Again on this... the fact of not being able to do a case insensitive sort through an index is really becoming a blocking issue in some scenario, especially in big data world. We've already faced 2 situations where we need to perform sort and paging on quite large datasets (3M items) in order to display data in a web app. I imagine this is a nightmare from implementation point of view, but it's a really important feature when you've to deal with large datasets. Thanks | ||||||||||||||||
| Comment by Ivan Fioravanti [ 22/May/14 ] | ||||||||||||||||
|
I love MongoDB too and I'm pushing its usage everywhere in our projects and for our customers. I'm receiving many requests related to Case Insensitive search and I'd really like to have this option built-in the product. For some customers we have used Elasticsearch for this and other usage, but it's too much for standard use. +1 Thanks. | ||||||||||||||||
| Comment by Paco Hernández [ 05/May/14 ] | ||||||||||||||||
|
I am an advocate of MongoDB, I love the product, but I can not understand how something so common & so critical has not been addressed in over 4 years. The solution to duplicate columns doubles the space in disk, besides having to apply additional projections to prevent these fields are returned, and so on. Not an elegant solution, but it is the only one we have. Is there any hope that something will be implemented in the next version of Mongo? +1 To the creation of a special index. Thanks. | ||||||||||||||||
| Comment by Ovidiu Anicai [ 03/Dec/13 ] | ||||||||||||||||
|
You can duplicate the column with all lowercase (maybe also replace characters with accents or special chars) then you do the search with a lowercase string. | ||||||||||||||||
| Comment by Tim Hawkins [ 26/Mar/13 ] | ||||||||||||||||
|
@arkady | ||||||||||||||||
| Comment by Joel Sanderson [ 15/Feb/13 ] | ||||||||||||||||
|
@Clifford - I also like your concept of using a function to compute the index value. I'd love to see this functionality in MongoDB! | ||||||||||||||||
| Comment by Phil Idem [ 08/Oct/12 ] | ||||||||||||||||
|
If this feature and Full-Text Searching ( @Clifford, good suggestion. However, I think the query engine might need a special syntax when using find so that caller can clearly state that they are doing a case-insensitive search. I doubt you want to prevent the ability to do a case-sensitive search just because you have a case-insensitive index. Postgres has a similar feature: Similar to Postgres, I think it should be possible to use the expression without the index (the index should be used to improve performance). Maybe something like this could be used:
| ||||||||||||||||
| Comment by Arkadiy Kukarkin [ 19/Sep/12 ] | ||||||||||||||||
|
The lowercase index suggestion (as well as the current lowercase field workaround) don't really work outside ascii. Given that Reichwaldstraße → reichwaldstrasse should REICHWALDSTRASSE match documents with Reichwaldstraße or reichwaldstrasse? ΌΣΟΣ to όσος or όσοσ? etc. You really need proper unicode case folding for these. That being said, very very needed feature, maybe even in a degraded ascii-only version. | ||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 19/Jul/12 ] | ||||||||||||||||
|
I think indexes on expressionsa re definitely something we want to do. | ||||||||||||||||
| Comment by Idris Mokhtarzada [ 19/Jul/12 ] | ||||||||||||||||
|
+1 for Clifford's computed idea | ||||||||||||||||
| Comment by Clifford Hammerschmidt [ 19/Jul/12 ] | ||||||||||||||||
|
Can we consider making this more general. db.users.ensure_index({name:{order: 1, computed: function() { return this.name.toLowerCase(); }}); ); ); -> returns {_id:ObjectId(...), name: "Bob"} So the index on name, and queries against name, are passed through the "computed" logic. There are issues with using a surrogate field and placing this logic to populate the surrogate field in the applications accessing the db rather than in the indexing logic in the db: Of course having: db.users.ensure_index({name:{order:1, index_type: case_insensitive}}); Where the index_ype is used to implement the index logic in C rather than javascript would be much faster. If you're open to that I'd suggest having javascript computation as a type, e.g.: db.users.ensure_index({name:{order:1, index_type: javascript, javascript: function() { return this.name.toLowerCase();}}}); Or even an FFI version that loads the indexing logic from a plugin: db.users.ensure_index({name:{order:1, index_type: ffi_cdecl, ffi_lib:"custom.so"}}); // C signature might be wrong, it's been a while since I've done C... | ||||||||||||||||
| Comment by jonathan roeder [ 15/May/12 ] | ||||||||||||||||
|
Is it agreed that this shouldn't just be an index feature? Lookup via index or scan should behave consistently. | ||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 10/Apr/12 ] | ||||||||||||||||
|
You can look at "fix for" above to see when it is scheduled. | ||||||||||||||||
| Comment by Sonal Shah [ 06/Apr/12 ] | ||||||||||||||||
|
Any Idea when can it available? Thanks, | ||||||||||||||||
| Comment by Tuner [ 01/Apr/12 ] | ||||||||||||||||
|
+1 Major feature, and extremely important in every non-ascii database... | ||||||||||||||||
| Comment by Sonal Shah [ 29/Mar/12 ] | ||||||||||||||||
|
Any updates on this feature ? When can be it available. I need it badly. Thanks, | ||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 25/Mar/12 ] | ||||||||||||||||
|
If you use utf8_unicode_ci in mysql, I believe it just ignores accents, etc... Or you can choose a specific language collation to get correct sorting. We could do that, I think the discussion is around whether there is a "better" solution. Likely not... | ||||||||||||||||
| Comment by Robert La Ferla [ 25/Mar/12 ] | ||||||||||||||||
|
How do other databases like Oracle/MySQL handle this? MongoDB should be no different. | ||||||||||||||||
| Comment by Colin Mollenhour [ 14/Dec/11 ] | ||||||||||||||||
|
If a field had a case-insensitive index, would queries match case-insensitively only when using the index? If so that could be very confusing.. What about "ci" operators like .find({"foo":{$ciEq:"bar"}}) and .sort( {"foo":2}) and .sort( {"foo":-2}). Sorts and queries that don't specify "ci" operators or +/-2 would either not use the index or use the index only if it is case-sensitive (same behaviour as current). "ci" queries/sorts would use an index if available, but if not available would still do the correct matching/sorting (would be very slow of course). | ||||||||||||||||
| Comment by Remon van Vliet [ 16/Mar/11 ] | ||||||||||||||||
|
I don't think localization is directly related to indexes and I doubt it should be part of ensureIndex syntax. Also, is there a specific spec available on how this will work internally? For example, will turning on the case insensitivity flag simply mean lowercase versions of strings are stored in the index for case insensitive sorts or will the actual index value be replaced with the lowercase version in which case query behaviour is affected. In other words, with this index flag on will find( {a:"Hi"}) and find( {a:"hi"}) return the same document? | ||||||||||||||||
| Comment by Idris Mokhtarzada [ 15/Mar/11 ] | ||||||||||||||||
|
If you read the comments or the end of the description, you'll see there's a workaround that will allow case-insensitive sorting, so no, it's not required. However, that workaround is obviously less than ideal. | ||||||||||||||||
| Comment by Doug Mayer [ 15/Mar/11 ] | ||||||||||||||||
|
Has anyone looked at this in the last 8 months? In order to do case-insensitive sorting on the MongoDB server side, is this not effectively required? | ||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 27/Jul/10 ] | ||||||||||||||||
|
For short term - I would just add a 2nd field that you call .toLower() on before inserting. | ||||||||||||||||
| Comment by Sean Wolfe [ 27/Jul/10 ] | ||||||||||||||||
|
What timeframe is expected to add this feature? Users are complaining about how our sorts are case sensitive. Is there an alternate way of sorting records on a key in a case-insensitive fashion? | ||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 02/Apr/10 ] | ||||||||||||||||
|
trying to keep 1.5/1.6 very focused on sharding + replica sets. |