-
Type: Bug
-
Resolution: Works as Designed
-
Priority: Major - P3
-
None
-
Affects Version/s: 4.0.9
-
Component/s: Sharding
-
Labels:
-
Sharding
-
ALL
-
Sharding 2019-07-01, Sharding 2019-07-15, Sharding 2019-09-09
ShardRegistry::reload() on a config server waits for majority read on a local shard. If it coincides with the LogicalSessionsCache::refresh() which performs batch writes it may end up in the deadlock while calling ShardRegistry::getShard() while refreshing collectionRoutingInfo which can join the reload().
The related stack traces are in the BF-12772
Suggested Fix
I propose to check in ReplicationCoordinatorImpl::waitUntilOpTimeForRead if secondaries are up or down. It should behave similarly to the case when _isShutdown flag is set.