[SERVER-16900] regex query fails on 2.4.5 when field's value is above a certain size Created: 16/Jan/15  Updated: 17/Jan/15  Resolved: 17/Jan/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.4.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: ted foss Assignee: Ramon Fernandez Marina
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-5290 fail to insert docs with fields too l... Closed
Operating System: ALL
Steps To Reproduce:

1. Create a document with a field whose value is > 1011 characters on (at least) mongo 2.4.5.
2. Search using a regex that matches a part of the field.
3. Search using the same regex in a $or[] clause with any other regex.
4. The first search fails, the second succeeds.

Participants:

 Description   

Querying with a simple regex will not return documents that match if the field size is beyond 1011 characters.

However, searching using the same regex in a $or clause with another regex (which can be an empty search on its own) will return the correct documents.

For example, using the document below where the big_blob value is 1012 characters long:

[Fri Jan 16 17:15:27 ec2-user@:~ ] $ mongo --version
MongoDB shell version: 2.4.5
[Fri Jan 16 17:15:34 ec2-user@:~ ] $ mongod --version
db version v2.4.5
Fri Jan 16 17:15:37.896 git version: a2ddc68ba7c9cee17bfe69ed840383ec3506602b
[Fri Jan 16 17:15:37 ec2-user@:~ ] $ mongo CR
MongoDB shell version: 2.4.5
connecting to: CR
Reporting:PRIMARY> a= {
... "_id": 1,
... "big_blob":
... "Blergh1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111"
...
... }
{
	"_id" : 1,
	"big_blob" : "Blergh
}
Reporting:PRIMARY> db.patients.save(a)
Reporting:PRIMARY> db.patients.find({big_blob:/Blergh/})
Reporting:PRIMARY> db.patients.find({$or:[{big_blob:/Blergh/},{fakekey:'fake'}]})
{ "_id" : 1, "big_blob" : "Blergh}
Reporting:PRIMARY>

The first query fails, but the second works, even though they both should return the same document.

compared to this example where the value is 1 character shorter, and both queries return the correct document:

Reporting:PRIMARY> a= {
... "_id": 1,
... "big_blob":
... "Blergh
...
... }
{
	"_id" : 1,
	"big_blob" : "Blergh111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111"
}
Reporting:PRIMARY> db.patients.save(a)
Reporting:PRIMARY> db.patients.find({big_blob:/Blergh/})
{ "_id" : 1, "big_blob" : "Blergh111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111" }
Reporting:PRIMARY> db.patients.find({$or:[{big_blob:/Blergh/},{fakekey:'fake'}]})
{ "_id" : 1, "big_blob" : "Blergh}
Reporting:PRIMARY>

I've confirmed this bug does not exist in 2.6.7:



 Comments   
Comment by ted foss [ 16/Jan/15 ]

You are correct, there is an index on that field. That is exactly what was going on. Inserting into a new collection without any index behaves correctly.

Thanks.

Comment by Ramon Fernandez Marina [ 16/Jan/15 ]

tfoss, I think this ticket is a duplicate of SERVER-5290. Can you confirm whether you have an index on big_blob?

If you do, the regex query doesn't show the document because the query uses the index but the document was never added to the index. The $or query uses a table scan, which produces the document.

You can verify what each query is doing by appending .explain(true) to each query.

Generated at Thu Feb 08 03:42:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.