[SERVER-29805] Case insensitive regex search does not take advantage of collation index Created: 23/Jun/17  Updated: 27/Oct/23  Resolved: 23/Jun/17

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 3.4.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: dane truelson Assignee: Kelsey Schubert
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
is documented by DOCS-9933 Explain that regex is unable to take ... Closed
Operating System: ALL
Steps To Reproduce:

> db.createCollection("bigbug2",{collation:{locale:"en",strength:2}});
{ "ok" : 1 }
> db.bigbug2.insert({a:"bob"});
WriteResult({ "nInserted" : 1 })
> db.bigbug2.insert({a:"james"});
WriteResult({ "nInserted" : 1 })
> db.bigbug2.createIndex({a:1});
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> db.bigbug2.find({a:/^bo/i}).explain("executionStats");
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "testing.bigbug2",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"a" : {
				"$regex" : "^bo",
				"$options" : "i"
			}
		},
		"collation" : {
			"locale" : "en",
			"caseLevel" : false,
			"caseFirst" : "off",
			"strength" : 2,
			"numericOrdering" : false,
			"alternate" : "non-ignorable",
			"maxVariable" : "punct",
			"normalization" : false,
			"backwards" : false,
			"version" : "57.1"
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"a" : {
					"$regex" : "^bo",
					"$options" : "i"
				}
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"a" : 1
				},
				"indexName" : "a_1",
				"collation" : {
					"locale" : "en",
					"caseLevel" : false,
					"caseFirst" : "off",
					"strength" : 2,
					"numericOrdering" : false,
					"alternate" : "non-ignorable",
					"maxVariable" : "punct",
					"normalization" : false,
					"backwards" : false,
					"version" : "57.1"
				},
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"a" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"a" : [
						"[\"\", {})",
						"[/^bo/i, /^bo/i]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1,
		"executionTimeMillis" : 1,
		"totalKeysExamined" : 2,
		"totalDocsExamined" : 2,
		"executionStages" : {
			"stage" : "FETCH",
			"filter" : {
				"a" : {
					"$regex" : "^bo",
					"$options" : "i"
				}
			},
			"nReturned" : 1,
			"executionTimeMillisEstimate" : 0,
			"works" : 3,
			"advanced" : 1,
			"needTime" : 1,
			"needYield" : 0,
			"saveState" : 0,
			"restoreState" : 0,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 2,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 2,
				"executionTimeMillisEstimate" : 0,
				"works" : 3,
				"advanced" : 2,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 0,
				"restoreState" : 0,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"a" : 1
				},
				"indexName" : "a_1",
				"collation" : {
					"locale" : "en",
					"caseLevel" : false,
					"caseFirst" : "off",
					"strength" : 2,
					"numericOrdering" : false,
					"alternate" : "non-ignorable",
					"maxVariable" : "punct",
					"normalization" : false,
					"backwards" : false,
					"version" : "57.1"
				},
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"a" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"a" : [
						"[\"\", {})",
						"[/^bo/i, /^bo/i]"
					]
				},
				"keysExamined" : 2,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "alwaysbcoding",
		"port" : 27017,
		"version" : "3.4.4",
		"gitVersion" : "888390515874a9debd1b6c5d36559ca86b44babd"
	},
	"ok" : 1
}

Participants:

 Description   
Original Summary

Incorrect bounds on regex search with collation index

Original Description

When an index is created with a collation strength of 1 or 2 (case insensitive), bounds are not correctly set when running a regex query like this /^search/i. In the following I search a simple collection like so:

{a:/^bo/i}

. When I run an explain on it, it has incorrect bounds added "[\"\", {})". This causes the index stage to be completely useless, as it selects everything in the whole index, at least in my tests! In the demo below, you can note that the input IXSCAN stage returns 2 results, when it should only return one.

"indexBounds" : {
"a" : [
"[\"\", {})",
"[/^bo/i, /^bo/i]"
]
},



 Comments   
Comment by Charlie Swanson [ 26/Jun/17 ]

Hi dtruel,

We seem to have some similar requests in SERVER-14196, SERVER-14197, and SERVER-9925, but none of them involve collation as far as I can tell. Give a look over those to see if any of those will satisfy your use case. If not, I'd suggest opening a new ticket describing the functionality you're requesting.

Best,
Charlie

Comment by dane truelson [ 23/Jun/17 ]

Hi Thomas, thanks for your quick response. Collation support is awesome, and I'm so grateful that you guys were able to implement it in mongodb. However, it's a significant issue still outstanding that you can't search "starts with" on an index with a collation. For example, I have a CRM type application. People shouldn't have to type a person's full name when searching. Instead, they should be able to type just the first few letters and get a quick result. There is no theoretical reason why this shouldn't work - it is simply a matter of implementing it. Of course, I could always make new fields with a lowercase version, but that is hacky and ugly, and completely nullifies the point in using a collation at all, and definitely goes completely against mongodb's whole mantra of "faster development time". I know I'm not the only person with this use case. Anybody using any form of autocomplete is going to be surprised when this doesn't work, and I suspect that's a lot of people. I think you guys should fast track adding support for this because you have already done the hard part of adding collations, and this would be a great way to top it off. Should I make a new issue requesting this feature?

Comment by Kelsey Schubert [ 23/Jun/17 ]

Hello dtruel,

Thank you for the report. Looking over the output you've provided, it appears that this is the expected behavior. Unfortunately the regex implementation is not collation-aware and therefore cannot use indexes with a collation specified. I've updated the ticket summary to clarify this behavior.

Please note that we've opened DOCS-9933 to clarify this in the documentation.

Kind regards,
Thomas

Generated at Thu Feb 08 04:21:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.