[SERVER-41217] Potential deadlock between ShardRegistry and LSC refresh Created: 17/May/19  Updated: 27/Oct/23  Resolved: 05/Sep/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.0.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Misha Tyulenev Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Works as Designed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Sharding
Operating System: ALL
Sprint: Sharding 2019-07-01, Sharding 2019-07-15, Sharding 2019-09-09
Participants:

 Description   

ShardRegistry::reload() on a config server waits for majority read on a local shard. If it coincides with the LogicalSessionsCache::refresh() which performs batch writes it may end up in the deadlock while calling ShardRegistry::getShard() while refreshing collectionRoutingInfo which can join the reload().
The related stack traces are in the BF-12772

Suggested Fix

I propose to check in ReplicationCoordinatorImpl::waitUntilOpTimeForRead if secondaries are up or down. It should behave similarly to the case when _isShutdown flag is set.



 Comments   
Comment by Kaloian Manassiev [ 05/Sep/19 ]

There is no deadlock between the LSC thread and ShardRegistry reload. All the stack traces in BFG-280106 (because the main logs from which BF-12772 was created) point to everybody involved waiting on either afterOpTime read or majority write against the config server primary. However, the config server primary has crashed with an invariant failure pool->_checkedOutPool.empty() src/mongo/executor/connection_pool.cpp.

The more interesting issue in BF-12772 is why the remaining 2 nodes weren't able to elect a primary, but I will continue that conversation there.

Comment by Misha Tyulenev [ 10/Jun/19 ]

matthew.saltz The issues are related but the scenario is not exactly the same: the BF12772 does not create the config.syste.sessions collection. However, the hang condition is similar - waiting for the majority while secondary nodes are down. I'll look more into it to check if there is the same root cause.

Comment by Matthew Saltz (Inactive) [ 07/Jun/19 ]

misha.tyulenev I think this ticket may be a dupe of the one linked, but haven't checked this one to see if the symptoms are exactly the same

Comment by Matthew Saltz (Inactive) [ 04/Jun/19 ]

So is this ticket description inaccurate then?

Comment by Misha Tyulenev [ 03/Jun/19 ]

Good point, unless the replication is not calling getCollectionRoutingInfo it should not block.

Comment by Matthew Saltz (Inactive) [ 03/Jun/19 ]

One thing I'm not quite following is: Why does the LogicalSessionCache refresh block replication?

Comment by Misha Tyulenev [ 03/Jun/19 ]

matthew.saltz i dont think it is a direct dup, the scenario is slighty different. However the fix fo this bug will likely fix the BF you are looking at. In the BF-12772 the following scenario happens on the node0 of the config shard
the LogicalSessionCache thread:
calls mongo::CatalogCache::_getCollectionRoutingInfoAt which is scheduling refreshCollectionRoutingInfo which calls ShardRegistry::getShard which can join the ShardRegistry::reload

the shardRegitry refresh thread calls refresh and waits for the replication to be completed which will be completed once the write in the LogicalSessionCache refresh thread finishes.

Generated at Thu Feb 08 04:57:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.