Priority: Major - P3
Resolution: Gone away
Affects Version/s: None
Fix Version/s: None
Linked BF Score:0
Consider a replica set primary with top of oplog = logical clock = commit point = T. A writer that creates an unreplicated collection (in a KVStorageEngine such as WT) takes the following steps:
- Acquire locks
- Create collection in the _mdb_catalog
- Commits the transaction
- Runs onCommit handlers, setting the min visible on the collection to T.
- Releases locks
A reader starting a new transaction and acquiring a collection takes the following steps:
- Open a transaction with read timestamp of T.
- Acquire locks (I believe these are quickly interrupted if there's lock contention).
- Note the minVisible on the collection = T which is satisfied via the transaction read timestamp of T.
If the reader opens the transaction (R1) before the writer commits (W3) and acquires locks (R2) after the writer releases (W4), the reader will observe the in-memory catalog believing the collection exists, but the storage engine will not (due to the concurrent transactions). In debug builds, this typically manifests as a crash when trying to access any index.
It's unclear how this can manifest with arbitrary queries/sorts/aggregations being performed with the query code consuming this state where the in-memory catalog is inconsistent with the storage engine's recovery unit snapshot.