[SERVER-27807] creating a snapshot and registering it in the replcoord is not synchronous Created: 25/Jan/17  Updated: 05/Apr/17  Resolved: 25/Jan/17

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: None
Fix Version/s: 3.2.13, 3.4.3, 3.5.2

Type: Bug Priority: Major - P3
Reporter: Eric Milkie Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-20844 Start ReplSetTests faster wrt initial... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Storage 2017-02-13
Participants:
Linked BF Score: 0

 Description   

Normally, the replication coordinator keeps track of all WiredTiger snapshots in a vector _uncommittedSnapshots, which is protected by the replcoord mutex. This vector needs to mirror the actual list of snapshots in WiredTiger.
Dropping all snapshots is also protected by this mutex – the _uncommittedSnapshots vector and the storage engine's list of snapshots are updated at the same time under this mutex lock.
Creating a new snapshot, however, is not completely protected by this mutex. The snapshot is created in the storage engine outside of the mutex lock, and then the replCoord state is updated under the mutex lock. Thus, if dropAllSnapshots() is asynchronously called in between the time that a snapshot is created in WiredTiger and the time where it is registered in _uncommittedSnapshots, it is possible for the system to later attempt to use a snapshot it thinks is there but is not actually present in WiredTiger.
Currently, we call dropAllSnapshots() at rollback time, at the beginning of initial sync, and at reconfig time, so any of those actions could trigger an fassert.



 Comments   
Comment by Githook User [ 16/Feb/17 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-27807 synchronize creating a snapshot with its registration in replcoord

This commit prevents a race between creating a snapshot in the storage engine and
registering it in the replication coordinator. The replication coordinator maintains
a vector of outstanding snapshots, and it needs to stay in sync with the actual snapshots
in the storage engine. The replication coordinator mutex is used to ensure this synchronization.

(cherry picked from commit 8253fab192fad307a07846878e368e970990d7b3)
Branch: v3.2
https://github.com/mongodb/mongo/commit/95d4a0a472e436dad9380db917145f121d8c3720

Comment by Githook User [ 08/Feb/17 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-27807 synchronize creating a snapshot with its registration in replcoord

This commit prevents a race between creating a snapshot in the storage engine and
registering it in the replication coordinator. The replication coordinator maintains
a vector of outstanding snapshots, and it needs to stay in sync with the actual snapshots
in the storage engine. The replication coordinator mutex is used to ensure this synchronization.

(cherry picked from commit 8253fab192fad307a07846878e368e970990d7b3)
Branch: v3.4
https://github.com/mongodb/mongo/commit/c29ab24ab04e44fca58e9987645ad45166b14bee

Comment by Githook User [ 25/Jan/17 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-27807 synchronize creating a snapshot with its registration in replcoord

This commit prevents a race between creating a snapshot in the storage engine and
registering it in the replication coordinator. The replication coordinator maintains
a vector of outstanding snapshots, and it needs to stay in sync with the actual snapshots
in the storage engine. The replication coordinator mutex is used to ensure this synchronization.
Branch: master
https://github.com/mongodb/mongo/commit/8253fab192fad307a07846878e368e970990d7b3

Generated at Thu Feb 08 04:16:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.