Details
-
Bug
-
Resolution: Works as Designed
-
Major - P3
-
None
-
4.0.9
-
Sharding
-
ALL
-
Sharding 2019-07-01, Sharding 2019-07-15, Sharding 2019-09-09
Description
ShardRegistry::reload() on a config server waits for majority read on a local shard. If it coincides with the LogicalSessionsCache::refresh() which performs batch writes it may end up in the deadlock while calling ShardRegistry::getShard() while refreshing collectionRoutingInfo which can join the reload().
The related stack traces are in the BF-12772
Suggested Fix
I propose to check in ReplicationCoordinatorImpl::waitUntilOpTimeForRead if secondaries are up or down. It should behave similarly to the case when _isShutdown flag is set.