Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-61334

Replication batcher uninterruptible lock deadlocks with storage change

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.2.0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Fully Compatible
    • ALL
    • Replication 2021-11-15
    • 135

      The storage change code takes a global lock, interrupts all opCtxs, and waits for the opCtx to be destroyed. An uninterruptible global lock can deadlock with this, since the global lock won't release until the opCtx is destroyed, and the opCtx will wait forever for it. Most uses of uninterruptible locks do not run during initial sync (e.g. prepared transactions) and aren't a problem, but for some reason the ReplBatcher thread runs continuously even in initial sync, and it requires an uninterruptible lock.

      Can be fixed by having the batcher take an interruptible global lock before the uninterruptible section.

            Assignee:
            matthew.russotto@mongodb.com Matthew Russotto
            Reporter:
            matthew.russotto@mongodb.com Matthew Russotto
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: