[SERVER-63667] .find() returning multiple instances of the same document Created: 15/Feb/22  Updated: 27/Oct/23  Resolved: 24/Feb/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.0.5
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Claudiu Saftoiu Assignee: Backlog - Storage Execution Team
Resolution: Works as Designed Votes: 0
Labels: duplicates, query
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

archlinux , 128 GB RAM server


Assigned Teams:
Storage Execution
Participants:

 Description   

Mongo appears to be returning duplicate documents for the same query, i.e. it returns more documents than there are unique {{_id}}s in the returned documents:
lobby-brain> count_iterated = 0; ids = {}
{}
lobby-brain> db.the_collection.find({
'a_boolean_key': true
}).forEach((el) => {
count_iterated += 1;
ids[el._id] = (ids[el._id]||0) + 1;
})
lobby-brain> count_iterated
278
lobby-brain> Object.keys(ids).length
251
That is, the number of unique _id returned is 251 – but there were 278 documents returned by the cursor.

Investigating further:
lobby-brain> ids
{
'60cb8cb92c909a974a96a430': 1,
'61114dea1a13c86146729f21': 1,
'6111513a1a13c861467d3dcf': 1,
...
'61114c491a13c861466d39cf': 2,
'61114bcc1a13c861466b9f8e': 2,
...
}
lobby-brain> db.the_collection.find({
_id: ObjectId("61114c491a13c861466d39cf")
}).forEach((el) => print("foo"));
foo

>
{{}}

That is, there aren't actually duplicate documents with the same _id -- it's just an issue with the .find().

I tried restarting the database, and rebuilding an index involving 'a_boolean_key', with the same results.

I've never seen this before and this seems impossible... 

Version info:
Using MongoDB: 5.0.5
Using Mongosh: 1.0.4
{{}}

It is a stand-alone database, no replica set or sharding or anything like that.

Further Info

One thing to note is, there is a compound index with a_boolean_key as the first index, and a datetime field as the second. The boolean key is rarely updated on the database (~once/day), but the datetime field is frequently updated.

Maybe these updates are causing the duplicate return values?



 Comments   
Comment by Claudiu Saftoiu [ 15/Feb/22 ]

Understood! 

In this case `a_boolean_key` wasn't updated frequently, but the 2nd field on the compound index was updated frequently – so I'm presuming the same applies. Good to learn something new!

Comment by Louis Williams [ 15/Feb/22 ]

Hi, csaftoiu@gmail.com, if you perform an index scan that is concurrent with updates, it is possible to see duplicate _id values.

If you modify a document's `a_boolean_key`, the new key will change its position in the sorted index. It could move before or after the current position of your cursor. If you encounter this race, you will see the same document twice, but with different values for its `a_boolean_key`.

Generated at Thu Feb 08 05:58:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.