Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-49046

Provide the ability to perform a collation aware search only on the fields on which the index is defined

    • Type: Icon: Improvement Improvement
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Query 2020-07-27

      If we have an index with a certain collation defined, then a search performed using that index will use the collation not only for that field but for other fields as well.

      Consider a test collection with its corresponding indexes like this:

      > db.coll_insens_search_test.find()
      { "_id" : 1, "A" : "goat", "B" : "core" }
      { "_id" : 2, "A" : "goat", "B" : "CORE" }
      { "_id" : 3, "A" : "GoAt", "B" : "core" }
      { "_id" : 4, "A" : "GOAT", "B" : "CORE" }
      { "_id" : 5, "A" : "GOat", "B" : "core" }
      
      > db.coll_insens_search_test.getIndexes()
      [
      	{
      		"v" : 2,
      		"key" : {
      			"_id" : 1
      		},
      		"name" : "_id_",
      		"ns" : "test.coll_insens_search_test"
      	},
      	{
      		"v" : 2,
      		"key" : {
      			"B" : 1
      		},
      		"name" : "HD_CaseInsensitiveIndex",
      		"ns" : "test.coll_insens_search_test",
      		"collation" : {
      			"locale" : "en_US",
      			"caseLevel" : false,
      			"caseFirst" : "off",
      			"strength" : 2,
      			"numericOrdering" : false,
      			"alternate" : "non-ignorable",
      			"maxVariable" : "punct",
      			"normalization" : false,
      			"backwards" : false,
      			"version" : "57.1"
      		}
      	}
      ]
      

      The case-insensitive index above is only on field B and not on field A. However, the following query yields all 5 documents:

      > db.coll_insens_search_test.find({ A: "GOat", B:"core" }).collation({ locale: "en_US", strength: 2 })
      { "_id" : 1, "A" : "goat", "B" : "core" }
      { "_id" : 2, "A" : "goat", "B" : "CORE" }
      { "_id" : 3, "A" : "GoAt", "B" : "core" }
      { "_id" : 4, "A" : "GOAT", "B" : "CORE" }
      { "_id" : 5, "A" : "GOat", "B" : "core" }
      

      In other words, we see that the case-insensitive search is being performed not only for B on which the index is defined, but also for A which is not covered in the index (it is debatable whether this is the correct behavior). Otherwise it would've only returned the last document with "_id" : 5.

       
      The winningPlan for the above query looks like this:

      "winningPlan" : {
      			"stage" : "FETCH",
      			"filter" : {
      				"A" : {
      					"$eq" : "GOat"
      				}
      			},
      			"inputStage" : {
      				"stage" : "IXSCAN",
      				"keyPattern" : {
      					"B" : 1
      				},
      				"indexName" : "HD_CaseInsensitiveIndex",
      				"collation" : {
      					"locale" : "en_US",
      					"caseLevel" : false,
      					"caseFirst" : "off",
      					"strength" : 2,
      					"numericOrdering" : false,
      					"alternate" : "non-ignorable",
      					"maxVariable" : "punct",
      					"normalization" : false,
      					"backwards" : false,
      					"version" : "57.1"
      				},
      				"isMultiKey" : false,
      				"multiKeyPaths" : {
      					"B" : [ ]
      				},
      				"isUnique" : false,
      				"isSparse" : false,
      				"isPartial" : false,
      				"indexVersion" : 2,
      				"direction" : "forward",
      				"indexBounds" : {
      					"B" : [
      						"[\"-EK1\u0001\b\", \"-EK1\u0001\b\"]"
      					]
      				}
      			}
      		}
      

       

      Without using the collation, it yields only the 5th document as expected, as shown here:

      > db.coll_insens_search_test.find({ A: "GOat", B:"core" })
      { "_id" : 5, "A" : "GOat", "B" : "core" }
      

       

      This is a feature request to provide the ability to perform a collation aware search only on the fields on which the index is defined, and not expand it to other fields which are not covered in the index.

            Assignee:
            david.storch@mongodb.com David Storch
            Reporter:
            harshad.dhavale@mongodb.com Harshad Dhavale
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: