Normally, the replication coordinator keeps track of all WiredTiger snapshots in a vector _uncommittedSnapshots, which is protected by the replcoord mutex. This vector needs to mirror the actual list of snapshots in WiredTiger.
Dropping all snapshots is also protected by this mutex – the _uncommittedSnapshots vector and the storage engine's list of snapshots are updated at the same time under this mutex lock.
Creating a new snapshot, however, is not completely protected by this mutex. The snapshot is created in the storage engine outside of the mutex lock, and then the replCoord state is updated under the mutex lock. Thus, if dropAllSnapshots() is asynchronously called in between the time that a snapshot is created in WiredTiger and the time where it is registered in _uncommittedSnapshots, it is possible for the system to later attempt to use a snapshot it thinks is there but is not actually present in WiredTiger.
Currently, we call dropAllSnapshots() at rollback time, at the beginning of initial sync, and at reconfig time, so any of those actions could trigger an fassert.
- is related to
-
SERVER-20844 Start ReplSetTests faster wrt initial election
- Closed