[SERVER-31217] Mongod inconsistency of regex evaluation on WiredTiger (3.4.6, 3.4.9) Created: 22/Sep/17  Updated: 27/Oct/23  Resolved: 22/Sep/17

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.4.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Itzhak Kagan Assignee: Mark Agarunov
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

Steps to reproduce.
crate a collection with collation that has a "caseLevel" : false

db.createCollection("comments",
	{
		"collation": {
		"locale": "de",
		"caseLevel": false,
		"caseFirst": "off",
		"strength": 2,
		"numericOrdering": false,
		"alternate": "non-ignorable",
		"maxVariable": "punct",
		"normalization": false,
		"backwards": false,		
		}
	}
)

Insert two documents

db.comments.insert({"text":"Abcdefghij"})
db.comments.insert({"text":"abcdefghijXyz"})
 
db.comments.find() // returns
{ "_id" : ObjectId("59c5161c3f1fff167ea2a5fc"), "text" : "Abcdefghij" }
{ "_id" : ObjectId("59c5161c3f1fff167ea2a5fd"), "text" : "abcdefghijXyz" }

Run a simple regex query:

> db.comments.find({"text": /cde/}) // returns
{ "_id" : ObjectId("59c5161c3f1fff167ea2a5fc"), "text" : "Abcdefghij" }
{ "_id" : ObjectId("59c5161c3f1fff167ea2a5fd"), "text" : "abcdefghijXyz" }

The result makes sence, since the collection has a collation which is case insensitive, so the sever "complement" the regex to be case insensitive.

Run another query that has two expressions:

db.comments.find({"text": /cde/, "text": /jxy/}) // returns no documents

Run a "modified" query that includes a case insensitive flag on the second expression:

db.comments.find({"text": /cde/, "text": /jxy/i}) // returns:
{ "_id" : ObjectId("59c5161c3f1fff167ea2a5fd"), "text" : "abcdefghijXyz" }

It looks like that when the sever evaluates the second expression the collation of the collection does not come into consideratin.

This behavior was tested on windows 64 bit versions 3.4.6 and 3.4.9

Participants:

 Description   

Environment: windows 64 bit.

Not all regex expressions consider the collation of a specified collection



 Comments   
Comment by Mark Agarunov [ 22/Sep/17 ]

Hello itzikkg,

Thank you for the report. Looking over the output you've provided, this appears to be the expected behavior. Unfortunately the regex implementation is currently not collation-aware, so the /i flag must be used for case insensitive regex expressions. Note that in your example:

> db.comments.find({"text": /cde/}) // returns
{ "_id" : ObjectId("59c5161c3f1fff167ea2a5fc"), "text" : "Abcdefghij" }
{ "_id" : ObjectId("59c5161c3f1fff167ea2a5fd"), "text" : "abcdefghijXyz" }

The query returns both documents because 'cde' is lower case in both. If the query is changed to 'abc' instead, only the second document is returned:

> db.comments.find({"text": /^abc/})
{ "_id" : ObjectId("59c535360c9ada7e7923f062"), "text" : "abcdefghijXyz" }

Thanks,
Mark

Generated at Thu Feb 08 04:26:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.