-
Type:
Improvement
-
Resolution: Duplicate
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query 2020-07-27
-
(copied to CRM)
-
None
-
None
-
None
-
None
-
None
-
None
-
None
If we have an index with a certain collation defined, then a search performed using that index will use the collation not only for that field but for other fields as well.
Consider a test collection with its corresponding indexes like this:
> db.coll_insens_search_test.find()
{ "_id" : 1, "A" : "goat", "B" : "core" }
{ "_id" : 2, "A" : "goat", "B" : "CORE" }
{ "_id" : 3, "A" : "GoAt", "B" : "core" }
{ "_id" : 4, "A" : "GOAT", "B" : "CORE" }
{ "_id" : 5, "A" : "GOat", "B" : "core" }
> db.coll_insens_search_test.getIndexes()
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.coll_insens_search_test"
},
{
"v" : 2,
"key" : {
"B" : 1
},
"name" : "HD_CaseInsensitiveIndex",
"ns" : "test.coll_insens_search_test",
"collation" : {
"locale" : "en_US",
"caseLevel" : false,
"caseFirst" : "off",
"strength" : 2,
"numericOrdering" : false,
"alternate" : "non-ignorable",
"maxVariable" : "punct",
"normalization" : false,
"backwards" : false,
"version" : "57.1"
}
}
]
The case-insensitive index above is only on field B and not on field A. However, the following query yields all 5 documents:
> db.coll_insens_search_test.find({ A: "GOat", B:"core" }).collation({ locale: "en_US", strength: 2 })
{ "_id" : 1, "A" : "goat", "B" : "core" }
{ "_id" : 2, "A" : "goat", "B" : "CORE" }
{ "_id" : 3, "A" : "GoAt", "B" : "core" }
{ "_id" : 4, "A" : "GOAT", "B" : "CORE" }
{ "_id" : 5, "A" : "GOat", "B" : "core" }
In other words, we see that the case-insensitive search is being performed not only for B on which the index is defined, but also for A which is not covered in the index (it is debatable whether this is the correct behavior). Otherwise it would've only returned the last document with "_id" : 5.
The winningPlan for the above query looks like this:
"winningPlan" : {
"stage" : "FETCH",
"filter" : {
"A" : {
"$eq" : "GOat"
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"B" : 1
},
"indexName" : "HD_CaseInsensitiveIndex",
"collation" : {
"locale" : "en_US",
"caseLevel" : false,
"caseFirst" : "off",
"strength" : 2,
"numericOrdering" : false,
"alternate" : "non-ignorable",
"maxVariable" : "punct",
"normalization" : false,
"backwards" : false,
"version" : "57.1"
},
"isMultiKey" : false,
"multiKeyPaths" : {
"B" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"B" : [
"[\"-EK1\u0001\b\", \"-EK1\u0001\b\"]"
]
}
}
}
Without using the collation, it yields only the 5th document as expected, as shown here:
> db.coll_insens_search_test.find({ A: "GOat", B:"core" })
{ "_id" : 5, "A" : "GOat", "B" : "core" }
This is a feature request to provide the ability to perform a collation aware search only on the fields on which the index is defined, and not expand it to other fields which are not covered in the index.
- duplicates
-
SERVER-25954 Support more granular collation specification
-
- Backlog
-