|
There is a possibility for shutdown-order deadlock between the LogicalSessionsCache and the ReplicationCoordinator, which looks like this:
The LogicalSessionsCache's thread calls into the catalog cache in order to fetch routing info for the config.system.sessions collection.
The catalog cache has been performing network operations (which convert to local storage engine/disk operations on the config server) under a mutex since the beginning of time. This means that if called at the inopportune moment by the LogicalSessionCache, it could cause its thread to block waiting for the majority snapshot to advance (the call under a mutex doesn't have a relevance here, but the fact that the operations convert to local reads on the config server due to ShardLocal does).
The LogicalSessionsCache is shut down and joined before the transport layer and all of this happens before the ReplicationCoordinator::shutdown. This means that the replication coordinanator depends on the LogicalSessionCache to shutdown, before it itself shuts down, which is a circular dependency.
The only thing that holds this deadlock from happening is that the shutdown command happens to first step down the replication coordinator, but this is a bit of a coincidental and lucky occurrence that could be inadvertently broken.
|