Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-27807

creating a snapshot and registering it in the replcoord is not synchronous

    • Fully Compatible
    • ALL
    • Storage 2017-02-13
    • 0

      Normally, the replication coordinator keeps track of all WiredTiger snapshots in a vector _uncommittedSnapshots, which is protected by the replcoord mutex. This vector needs to mirror the actual list of snapshots in WiredTiger.
      Dropping all snapshots is also protected by this mutex – the _uncommittedSnapshots vector and the storage engine's list of snapshots are updated at the same time under this mutex lock.
      Creating a new snapshot, however, is not completely protected by this mutex. The snapshot is created in the storage engine outside of the mutex lock, and then the replCoord state is updated under the mutex lock. Thus, if dropAllSnapshots() is asynchronously called in between the time that a snapshot is created in WiredTiger and the time where it is registered in _uncommittedSnapshots, it is possible for the system to later attempt to use a snapshot it thinks is there but is not actually present in WiredTiger.
      Currently, we call dropAllSnapshots() at rollback time, at the beginning of initial sync, and at reconfig time, so any of those actions could trigger an fassert.

            Assignee:
            milkie@mongodb.com Eric Milkie
            Reporter:
            milkie@mongodb.com Eric Milkie
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: