|
In a recent Evergreen run of sharding_jscore_passthrough_WT on OS X, a config server (the replica set primary, in particular) hung indefinitely during normal SIGTERM-initiated shutdown after the suite completed. The last lines of output from the mongod were as follows:
[ShardedClusterFixture:job0:configsvr:primary] 2015-10-03T12:16:43.197-0400 I CONTROL [signalProcessingThread] got signal 15 (Terminated: 15), will terminate after current cmd ends
|
[ShardedClusterFixture:job0:configsvr:primary] 2015-10-03T12:16:43.197-0400 I FTDC [signalProcessingThread] Stopping full-time diagnostic data capture
|
[ShardedClusterFixture:job0:configsvr:primary] 2015-10-03T12:16:43.199-0400 I REPL [signalProcessingThread] Stopping replication applier threads
|
Based on correlating this output with a reading of exitCleanly(), this points to a hang in one of the following:
- ReplicationCoordinator::shutdown()
- ShardingState::shutdown()
- Locker::lockGlobalComplete()
No further output was observed from this process; the test harness issued a SIGKILL against the process ~2 hours later. Unfortunately, the hang analyzer did not successfully produce any stack traces from this process (perhaps the hang analyzer is currently broken on OS X?).
In addition to investigating the hang analyzer issue, we could consider adding additional log-level 0 diagnostics to the mongod shutdown process to assist in narrowing down the culprit of this issue.
Evergreen link: https://evergreen.mongodb.com/task/mongodb_mongo_master_osx_108_sharding_jscore_passthrough_WT_1fd64cee88562e77883db5b75ee666a55b15e748_15_10_02_22_00_15
Logkeeper link: https://logkeeper.mongodb.org/build/560ffa9c90413021b1bacd4e/all
|