Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-46218

Race between removal and shutdown in arbiter

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2.4, 4.3.4
    • Component/s: Replication
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.2
    • Sprint:
      Repl 2020-02-24

      Description

      If an arbiter is shut down soon after it is removed from the replica set by a reconfig, the arbiter crashes and logs:

      [ReplCoord-2] This node is not a member of the config
      [ReplCoord-2] transition to REMOVED from ARBITER
      [ReplCoord-2] terminate() called. An exception is active; attempting to gather more information
      [ReplCoord-2] DBException::toString(): ShutdownInProgress: aborting KeysCollectionManager::PeriodicRunner::setFunc because node is shutting down
      Actual exception type: mongo::error_details::ExceptionForImpl<(mongo::ErrorCodes::Error)91, mongo::ExceptionForCat<(mongo::ErrorCategory)6>, mongo::ExceptionForCat<(mongo::ErrorCategory)7>, mongo::ExceptionForCat<(mongo::
      ErrorCategory)13> >
      ----- BEGIN BACKTRACE -----
       mongod(_ZN5mongo15printStackTraceERNS_14StackTraceSinkE+0xB4) [0x562227EC2114]
       mongod(_ZN5mongo15printStackTraceERSo+0x2F) [0x562227EC2E2F]
       mongod(+0x2AD2686) [0x562227EC1686]
       mongod(_ZN10__cxxabiv111__terminateEPFvvE+0x6) [0x562228033266]
       mongod(+0x2CD8589) [0x5622280C7589]
       mongod(__gxx_personality_v0+0x2C5) [0x562228032C85]
       libgcc_s.so.1(+0x10613) [0x7F876CB02613]
       libgcc_s.so.1(_Unwind_Resume+0x125) [0x7F876CB02E95]
       mongod(+0xD73B00) [0x562226162B00]
       mongod(_ZN5mongo10ThreadPool10_doOneTaskEPSt11unique_lockINS_5LatchEE+0xFF) [0x562226C5140F]
       mongod(_ZN5mongo10ThreadPool13_consumeTasksEv+0x91) [0x562226C53CD1]
       mongod(_ZN5mongo10ThreadPool17_workerThreadBodyEPS0_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x12E) [0x562226C54C7E]
       mongod(+0x1865E93) [0x562226C54E93]
       mongod(+0x2C5FCCF) [0x56222804ECCF]
       libpthread.so.0(+0x76DB) [0x7F876C8DA6DB]
       libc.so.6(clone+0x3F) [0x7F876C60388F]
      -----  END BACKTRACE  -----
      

      The sequence is on the arbiter is:

      • ReplicationCoordinatorImpl::_heartbeatReconfigFinish
      • ReplicationCoordinatorImpl::_performPostMemberStateUpdateAction with action=kActionRollbackOrRemoved
      • ReplicationCoordinatorExternalStateImpl::shardingOnStepDownHook (despite the name, this hook doesn't only run on stepdown)
      • KeysCollectionManager::enableKeyGenerator with doEnable=false
      • KeysCollectionManager::PeriodicRunner::setFunc is called with a lambda
      • The PeriodicRunner throws a shutdown error, which is uncaught and terminates mongod

      I can only reproduce this with an arbiter, not a data node, not sure why.

      Proposed fix: KeysCollectionManager::PeriodicRunner::setFunc catches and logs shutdown errors.

        Attachments

          Activity

            People

            Assignee:
            jesse A. Jesse Jiryu Davis
            Reporter:
            jesse A. Jesse Jiryu Davis
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: