-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
-
v4.2
-
Repl 2020-02-24
If an arbiter is shut down soon after it is removed from the replica set by a reconfig, the arbiter crashes and logs:
[ReplCoord-2] This node is not a member of the config [ReplCoord-2] transition to REMOVED from ARBITER [ReplCoord-2] terminate() called. An exception is active; attempting to gather more information [ReplCoord-2] DBException::toString(): ShutdownInProgress: aborting KeysCollectionManager::PeriodicRunner::setFunc because node is shutting down Actual exception type: mongo::error_details::ExceptionForImpl<(mongo::ErrorCodes::Error)91, mongo::ExceptionForCat<(mongo::ErrorCategory)6>, mongo::ExceptionForCat<(mongo::ErrorCategory)7>, mongo::ExceptionForCat<(mongo:: ErrorCategory)13> > ----- BEGIN BACKTRACE ----- mongod(_ZN5mongo15printStackTraceERNS_14StackTraceSinkE+0xB4) [0x562227EC2114] mongod(_ZN5mongo15printStackTraceERSo+0x2F) [0x562227EC2E2F] mongod(+0x2AD2686) [0x562227EC1686] mongod(_ZN10__cxxabiv111__terminateEPFvvE+0x6) [0x562228033266] mongod(+0x2CD8589) [0x5622280C7589] mongod(__gxx_personality_v0+0x2C5) [0x562228032C85] libgcc_s.so.1(+0x10613) [0x7F876CB02613] libgcc_s.so.1(_Unwind_Resume+0x125) [0x7F876CB02E95] mongod(+0xD73B00) [0x562226162B00] mongod(_ZN5mongo10ThreadPool10_doOneTaskEPSt11unique_lockINS_5LatchEE+0xFF) [0x562226C5140F] mongod(_ZN5mongo10ThreadPool13_consumeTasksEv+0x91) [0x562226C53CD1] mongod(_ZN5mongo10ThreadPool17_workerThreadBodyEPS0_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x12E) [0x562226C54C7E] mongod(+0x1865E93) [0x562226C54E93] mongod(+0x2C5FCCF) [0x56222804ECCF] libpthread.so.0(+0x76DB) [0x7F876C8DA6DB] libc.so.6(clone+0x3F) [0x7F876C60388F] ----- END BACKTRACE -----
The sequence is on the arbiter is:
- ReplicationCoordinatorImpl::_heartbeatReconfigFinish
- ReplicationCoordinatorImpl::_performPostMemberStateUpdateAction with action=kActionRollbackOrRemoved
- ReplicationCoordinatorExternalStateImpl::shardingOnStepDownHook (despite the name, this hook doesn't only run on stepdown)
- KeysCollectionManager::enableKeyGenerator with doEnable=false
- KeysCollectionManager::PeriodicRunner::setFunc is called with a lambda
- The PeriodicRunner throws a shutdown error, which is uncaught and terminates mongod
I can only reproduce this with an arbiter, not a data node, not sure why.
Proposed fix: KeysCollectionManager::PeriodicRunner::setFunc catches and logs shutdown errors.