-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Execution
-
Fully Compatible
-
ALL
-
Storage Execution 2025-06-23, Storage Execution 2025-07-07, Storage Execution 2025-07-21
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Steady state replication case
Suppose the following sequence occurs on the primary, which assigns recordIds as inserts come in:
- Insert {_id: 1}. Oplog entry: {op: "i", _id: 1, rid: 1}
- Delete {_id: 1}. Oplog entry: {op: "d", _id: 1, rid: 1}
- Kill and restart the primary
- Insert {_id: 2}. The primary checks on disk for the highest recordId, and that is currently 0, as no documents exist. Therefore it uses recordId(1), creating a new oplog entry: {op: "i", _id: 2, rid: 1}.
Now, if the secondary tries to apply these entries as a part of a batch, it may see in a batch:
[ {op: "i", _id: 1, rid: 1}, {op: "d", _id: 1, rid: 1}, {op: "i", _id: 2, rid: 1}, ]
The secondary assigns oplog entries to the applier threads based on the hash of the _id. Therefore it's possible that -
Applier thread 1 gets:
[ {op: "i", _id: 1, rid: 1}, {op: "d", _id: 1, rid: 1} ]
Applier thread 2 gets:
[
{op: "i", _id: 2, rid: 1}
]
As a result it's possible for the threads to interleave in a way that we are left with data corruption (if applier thread 1 deletes document with recordId(1) after applier thread 2 has inserted).
Solutions
See comments
- related to
-
SERVER-88309 Prevent user from inserting doc via applyOps with recordId that already exists
-
- Open
-
- split to
-
SERVER-107213 RecordIds can be reused (initial sync)
-
- In Progress
-