Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-61334

Replication batcher uninterruptible lock deadlocks with storage change

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major - P3 Major - P3
    • 5.2.0
    • None
    • None
    • None
    • Fully Compatible
    • ALL
    • Replication 2021-11-15
    • 135

    Description

      The storage change code takes a global lock, interrupts all opCtxs, and waits for the opCtx to be destroyed. An uninterruptible global lock can deadlock with this, since the global lock won't release until the opCtx is destroyed, and the opCtx will wait forever for it. Most uses of uninterruptible locks do not run during initial sync (e.g. prepared transactions) and aren't a problem, but for some reason the ReplBatcher thread runs continuously even in initial sync, and it requires an uninterruptible lock.

      Can be fixed by having the batcher take an interruptible global lock before the uninterruptible section.

      Attachments

        Activity

          People

            matthew.russotto@mongodb.com Matthew Russotto
            matthew.russotto@mongodb.com Matthew Russotto
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: