Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48641

Deadlock due to the MigrationDestinationManager waiting for write concern with the session checked-out

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.4.1, 4.7.0
    • Affects Version/s: 4.4.0-rc8
    • Component/s: Sharding
    • Fully Compatible
    • ALL
    • v4.4
    • Sharding 2020-07-13, Sharding 2020-07-27
    • 40

      The MigrationDestinationManager checks-out a session and then proceeds executing the recipient logic while that session is checked-out.

      The execution logic at some point reaches to a call to waitForWriteConcern which runs with the session still checked-out.

      Because the JournalFlusher wait is non-interruptible (and also because SERVER-40081 prohibits waitForWriteConcern while having a session checked-out), this this causes a three-thread deadlock with the replication coordinator:

      • T1: MigrationDestinationManager has a session checked-out and is waiting on waitForWriteConcern, which in turn is blocked on JournalFlusher::waitForJournalFlush
      • T2: The JournalFlusher is waiting on a MODE_IX RSM lock, which is held in MODE_X by ReplCoord-3
      • T3: ReplCoord-3, while holding the RSM lock in MODE_X, is killing sessions by calling invalidateSessionsForStepdown and this is blocked on the session checked-out by T1

            jack.mulrow@mongodb.com Jack Mulrow
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            0 Vote for this issue
            5 Start watching this issue