Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48144

waitUntilDurable should not take a mutex before taking locks

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.4.0-rc7, 4.7.0
    • Affects Version/s: None
    • Component/s: Storage
    • None
    • Fully Compatible
    • v4.4
    • Execution Team 2020-06-01
    • 22

      Replication is currently running into this deadlock issue, where the LastVote operation is using an Uninterruptible lock guard

                          AutoGetCollection (stuck on RSTL IX)
      Stepdown (RSTL X)
              _lastSyncMutex (stuck on LastVote's acquisition)

      Regardless of the uninterruptible lock guard, it is bad practice to take a mutex and then locks.

      I recommend either of the two options (prefer #2):

      1) waitUntilDurable should not call getToken under the mutex, but instead take the mutex after doing the getToken call that takes locks.
      I think this is workable, because getToken (which calls refreshOplogTruncateAfterPointIfPrimary) has its own mutex for atomicity reading from the oplog and updating the oplogTruncateAfterPoint. waitUntilDurable then calls onDurable with the result of getToken after the journal flush, but replication already has protections against going backwards in time.

      2) Remove waitUntilDurable's _lastSyncMutex because it is no longer needed for performance. I haven't verified that performance will be unaffected, but I believe it won't be affected from looking at the code. We've recently moved writeConcern's waitUntilDurable calls onto the async JournalFlusher thread, which has its own caller batching. Without the write callers, the only other waitUntilDurable callers are one offs for uncommon replication events. Batching doesn't seem like it needs to continue to occur in the waitUntilDurable code layer.

            dianna.hohensee@mongodb.com Dianna Hohensee (Inactive)
            dianna.hohensee@mongodb.com Dianna Hohensee (Inactive)
            0 Vote for this issue
            6 Start watching this issue