Apologies for summarizing the solution and not the problem.
It's imperative that the update chain for a storage engine key goes in increasing timestamp order (with only a few explicit caveats). The lazy initializing of record ids only on inserts creates a scenario where a key can be reused, and in the case of the _mdb_catalog, can result in an update that can't be rolled back (intentional 0-timestamp value) sitting on top of an update that must be eligible for rollback. This sequence of updates "locks in" a non-majority committed update.
The scenario:
- Create collection A. Inserts into _mdb_catalog (initializing the counter to "1") with RecordId(1), Timestamp(10)
- Drop collection A. Timestamp(20)
- Rollback to 15
- Replay drop collection A (No more records in the _mdb_catalog)
- Create a non-replicated collection (e.g: `local.replset.initialsync`). Initialize the counter to "1", Insert Record(1), Timestamp(0)
- Rollback to 15. The drop at timestamp 20 is "locked in". Collection A is missing
- related to
-
SERVER-48603 Rollback via refetch can result in out of order timestamps
- Closed