Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48125

Stepdown can deadlock with storing lastVote via journal flusher

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
      None
    • Operating System:
      ALL
    • Linked BF Score:
      25

      Description

      As part of storing the lastVote document, we will wait for it to be durable, and we will eventually call into refreshOplogTruncateAfterPointIfPrimary. This needs to acquire a global IX lock as part of an AutoGetCollection. This can deadlock with stepdown, as it tries to clear the oplog truncate after point, which in turn waits on a journal flush. The journal flusher needs to be able to run wait for durability too, but it cannot get to the critical section as that is protected by a mutex which is already held by the lastVote thread.

      We recently made storing the lastVote document fully uninterruptible in SERVER-47612.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              backlog-server-repl Backlog - Replication Team
              Reporter:
              vesselina.ratcheva Vesselina Ratcheva
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: