[SERVER-41217] Potential deadlock between ShardRegistry and LSC refresh Created: 17/May/19 Updated: 27/Oct/23 Resolved: 05/Sep/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 4.0.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Misha Tyulenev | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Sharding
|
| Operating System: | ALL |
| Sprint: | Sharding 2019-07-01, Sharding 2019-07-15, Sharding 2019-09-09 |
| Participants: |
| Description |
|
ShardRegistry::reload() on a config server waits for majority read on a local shard. If it coincides with the LogicalSessionsCache::refresh() which performs batch writes it may end up in the deadlock while calling ShardRegistry::getShard() while refreshing collectionRoutingInfo which can join the reload(). Suggested FixI propose to check in ReplicationCoordinatorImpl::waitUntilOpTimeForRead if secondaries are up or down. It should behave similarly to the case when _isShutdown flag is set. |
| Comments |
| Comment by Kaloian Manassiev [ 05/Sep/19 ] |
|
There is no deadlock between the LSC thread and ShardRegistry reload. All the stack traces in BFG-280106 (because the main logs from which BF-12772 was created) point to everybody involved waiting on either afterOpTime read or majority write against the config server primary. However, the config server primary has crashed with an invariant failure pool->_checkedOutPool.empty() src/mongo/executor/connection_pool.cpp. The more interesting issue in BF-12772 is why the remaining 2 nodes weren't able to elect a primary, but I will continue that conversation there. |
| Comment by Misha Tyulenev [ 10/Jun/19 ] |
|
matthew.saltz The issues are related but the scenario is not exactly the same: the BF12772 does not create the config.syste.sessions collection. However, the hang condition is similar - waiting for the majority while secondary nodes are down. I'll look more into it to check if there is the same root cause. |
| Comment by Matthew Saltz (Inactive) [ 07/Jun/19 ] |
|
misha.tyulenev I think this ticket may be a dupe of the one linked, but haven't checked this one to see if the symptoms are exactly the same |
| Comment by Matthew Saltz (Inactive) [ 04/Jun/19 ] |
|
So is this ticket description inaccurate then? |
| Comment by Misha Tyulenev [ 03/Jun/19 ] |
|
Good point, unless the replication is not calling getCollectionRoutingInfo it should not block. |
| Comment by Matthew Saltz (Inactive) [ 03/Jun/19 ] |
|
One thing I'm not quite following is: Why does the LogicalSessionCache refresh block replication? |
| Comment by Misha Tyulenev [ 03/Jun/19 ] |
|
matthew.saltz i dont think it is a direct dup, the scenario is slighty different. However the fix fo this bug will likely fix the BF you are looking at. In the BF-12772 the following scenario happens on the node0 of the config shard the shardRegitry refresh thread calls refresh and waits for the replication to be completed which will be completed once the write in the LogicalSessionCache refresh thread finishes. |