The MigrationDestinationManager checks-out a session and then proceeds executing the recipient logic while that session is checked-out.
The execution logic at some point reaches to a call to waitForWriteConcern which runs with the session still checked-out.
Because the JournalFlusher wait is non-interruptible (and also because
SERVER-40081 prohibits waitForWriteConcern while having a session checked-out), this this causes a three-thread deadlock with the replication coordinator:
- T1: MigrationDestinationManager has a session checked-out and is waiting on waitForWriteConcern, which in turn is blocked on JournalFlusher::waitForJournalFlush
- T2: The JournalFlusher is waiting on a MODE_IX RSM lock, which is held in MODE_X by ReplCoord-3
- T3: ReplCoord-3, while holding the RSM lock in MODE_X, is killing sessions by calling invalidateSessionsForStepdown and this is blocked on the session checked-out by T1
- is depended on by
SERVER-47645 Must invalidate all sessions on step down
- is related to
SERVER-48689 MigrationDestinationManager waits for thread to join with session checked out
- related to
SERVER-73106 [v4.4] Chunk migration attempts to wait for replication with session checked out when getLastErrorDefaults are used in replica set config, leading to server crash