Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Sprint:
Query 2020-07-27
Case:
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

If we have an index with a certain collation defined, then a search performed using that index will use the collation not only for that field but for other fields as well.

Consider a test collection with its corresponding indexes like this:

> db.coll_insens_search_test.find()
{ "_id" : 1, "A" : "goat", "B" : "core" }
{ "_id" : 2, "A" : "goat", "B" : "CORE" }
{ "_id" : 3, "A" : "GoAt", "B" : "core" }
{ "_id" : 4, "A" : "GOAT", "B" : "CORE" }
{ "_id" : 5, "A" : "GOat", "B" : "core" }

> db.coll_insens_search_test.getIndexes()
[
	{
		"v" : 2,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "test.coll_insens_search_test"
	},
	{
		"v" : 2,
		"key" : {
			"B" : 1
		},
		"name" : "HD_CaseInsensitiveIndex",
		"ns" : "test.coll_insens_search_test",
		"collation" : {
			"locale" : "en_US",
			"caseLevel" : false,
			"caseFirst" : "off",
			"strength" : 2,
			"numericOrdering" : false,
			"alternate" : "non-ignorable",
			"maxVariable" : "punct",
			"normalization" : false,
			"backwards" : false,
			"version" : "57.1"
		}
	}
]

The case-insensitive index above is only on field B and not on field A. However, the following query yields all 5 documents:

> db.coll_insens_search_test.find({ A: "GOat", B:"core" }).collation({ locale: "en_US", strength: 2 })
{ "_id" : 1, "A" : "goat", "B" : "core" }
{ "_id" : 2, "A" : "goat", "B" : "CORE" }
{ "_id" : 3, "A" : "GoAt", "B" : "core" }
{ "_id" : 4, "A" : "GOAT", "B" : "CORE" }
{ "_id" : 5, "A" : "GOat", "B" : "core" }

In other words, we see that the case-insensitive search is being performed not only for B on which the index is defined, but also for A which is not covered in the index (it is debatable whether this is the correct behavior). Otherwise it would've only returned the last document with "_id" : 5.

The winningPlan for the above query looks like this:

"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"A" : {
					"$eq" : "GOat"
				}
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"B" : 1
				},
				"indexName" : "HD_CaseInsensitiveIndex",
				"collation" : {
					"locale" : "en_US",
					"caseLevel" : false,
					"caseFirst" : "off",
					"strength" : 2,
					"numericOrdering" : false,
					"alternate" : "non-ignorable",
					"maxVariable" : "punct",
					"normalization" : false,
					"backwards" : false,
					"version" : "57.1"
				},
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"B" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"B" : [
						"[\"-EK1\u0001\b\", \"-EK1\u0001\b\"]"
					]
				}
			}
		}

Without using the collation, it yields only the 5th document as expected, as shown here:

> db.coll_insens_search_test.find({ A: "GOat", B:"core" })
{ "_id" : 5, "A" : "GOat", "B" : "core" }

This is a feature request to provide the ability to perform a collation aware search only on the fields on which the index is defined, and not expand it to other fields which are not covered in the index.

duplicates

SERVER-25954 Support more granular collation specification

Backlog

Assignee:: David Storch
Reporter:: Harshad Dhavale
Participants:: David Storch, Harshad Dhavale
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Jun 23 2020 08:37:13 PM UTC
Updated:: Jul 16 2020 10:24:39 PM UTC
Resolved:: Jul 16 2020 10:24:39 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates