Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-28181

Deadlock involving the mutexes of oplog fetcher and replication coordinator

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.4.4, 3.5.5
    • Component/s: Replication
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.4
    • Sprint:
      Repl 2017-03-27
    • Linked BF Score:
      0

      Description

      Replication coordinator stops the bgsync, which stops the running oplog fetcher, if there's a running oplog fetcher. Oplog fetcher needs the current term and the last committed optime to make new requests. As a result, they create an deadlock.

      • Replication coordinator, while holding replCoord's mutex, waits on oplog fetcher's mutex to stop it.
      • Oplog fetcher, while holding its mutex, waits on replCoord's mutex to get the current term and the last committed optime.

      To fix this, we need move the current term and last committed optime out of oplog fetcher's mutex.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: