Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33186

Primary node may deadlock during shutdown

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 3.6.1
    • Fix Version/s: None
    • Component/s: Replication, Stability
    • Labels:

      Description

      Note: this deadlock is similar to SERVER-28688 but this is another one.
      Note: I observed this deadlock in 3.6.1.

      ReplicationCoordinatorExternalStateImpl::shutdown calls _taskExecutor->join() while having _threadMutex locked. In most cases there are no tasks for worker threads and _taskExecutor->join() returns immediately. But in some rare situations DropPendingCollectionReaper has some collections to drop and while these tasks are running signal processing thread keeps _threadMutex locked. If at this moment replication logic decides to stepdown then we have a deadlock because ReplicationCoordinatorExternalStateImpl::startProducerIfStopped tries to acquire _threadMutex while holding the global exclusive lock. After startProducerIfStopped starts its wait for _threadMutex drop collection tasks are also blocked by the global lock.

      Attached file contains output of mongodb-waitsfor-graph, mongodb-show-locks, mongodb-uniqstack commands. In this file:

      • thread 2 (signalProcessingThread) owns _threadMutex lock (acquired in ReplicationCoordinatorExternalStateImpl::shutdown)
        and waits for shutdown of worker threads (_taskExecutor->shutdown(); _taskExecutor->join()
      • thread 47: "replexec-9" waits for _threadMutex (owned by thread 2)
        is processing _stepDownFinish event
        which calls _updateMemberStateFromTopologyCoordinator_inlock
        which calls startProducerIfStopped
        which tries to aquire _threadMutex
      • thread 48 (worker thread executing dropCollection task)
        waits for global lock owned by thread 47

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: