[SERVER-24621] Make find command min/max options respect the collation Created: 16/Jun/16 Updated: 13/Aug/16 Resolved: 29/Jul/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | 3.3.11 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Hatch | Assignee: | David Storch |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Sprint: | Query 18 (08/05/16) |
| Participants: |
| Description |
|
If the cursor.min() or cursor.max() options are supplied to the find command with string boundary values, then the string comparisons used to define the boundary should be collation-aware. Consider the following query:
This query respects the case-insensitive US English collation, and therefore strings beginning with both lowercase "x" and uppercase "X" should match. Similarly, min or max queries where the boundary is a nested array or object must respect the collation. For example,
should match documents where str is either {val: "x"} or {val: "X"}. Since min and max require a compatible index, this query will only succeed if there is an index on the collection with the key pattern {str: 1} that has the case-insensitive US English collation. If, however, the min/max boundary values do not include string, nested object, or nested array values, then the index collation is not required to match the query collation. For example, the query
can use any index with the key pattern {a: 1, b: 1}. The index is not required to have the case-insensitive US English collation in order to be compatible. Original descriptioncursor.min and cursor.max both accept an index key parameter. For expression indexes, we don't translate the index key provided by the query through the normal index key generation path. This currently results in the behavior below for hashed indexes, and other expression indexes.
The index key used by min/max corresponds exactly with the key stored in the index. For collation-aware indexes, min/max key queries will need to use ICU-translated comparison keys in order to yield any meaningful results. This is likely non-obvious to users. Additionally, min and max are used internally by sharding, so it is unlikely that changing behavior to translate these keys will be straightforward. In combination with collation, the behavior of min/max index selection, bounds generation and sorting is confusing.
Since the requested collation is case-insensitive (strength: 1), the user may expect both objects containing "a" and "A" to be returned. However, min/max only returns "a" since in the index ordering (according to simple binary comparison) "A" < "a"). |
| Comments |
| Comment by Githook User [ 29/Jul/16 ] |
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: |
| Comment by David Hatch [ 15/Jul/16 ] |
|
This becomes complicated further in the presence of multiple indexes with the same collation but different key patterns. min/max will select any index where the fields of the key pattern are a prefix of the min/max query. If multiple possible matches exist, the first is chosen. If there is a simple collated index that can support a min/max query, we should probably choose that instead of a non-simple collated field (to avoid the issues described in this ticket). |
| Comment by David Hatch [ 16/Jun/16 ] |
|
The current behavior results in incorrect (or unexpected) matching of results for min/max queries on collated indexes, but after Since bounds generation for hashed indexes (and other expression indexes) under min/max queries is inconsistent in the same manner as collation-aware indexes, we are deferring a resolution on this ticket until we have a more general solution. |