[SERVER-35619] Potential deadlock in ShardRegistry::replicaSetChangeShardRegistryUpdateHook Created: 15/Jun/18 Updated: 08/Jan/24 Resolved: 09/Mar/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Mathias Stearn | Assignee: | Randolph Tan |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | neweng, sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Sprint: | Sharding 2018-07-16, Sharding 2018-07-30, Sharding 2018-08-13, Sharding 2018-09-24, Sharding 2020-03-09, Sharding 2020-03-23 |
| Participants: |
| Description |
|
ShardRegistry::replicaSetChangeShardRegistryUpdateHook is registered in ReplicaSetMonitor::setSynchronousConfigChangeHook() which is explicitly documented as calling the hook while holding the RSM mutex. Unfortunately, that hook acquires ShardRegistyData::_mutex at https://github.com/mongodb/mongo/blob/82b62cf1e513657a0c35d757cf37eab0231ebc9b/src/mongo/s/client/shard_registry.cpp#L526.
These mutexes are acquired in the other order in ShardRegistryData::toBSON() when it transitively calls into ReplicaSetMonitor::getServerAddress(), so this could result in a deadlock. Suggested Fix1. Split off the serverAddressLock in ReplicaSetMonitor so the lock that is eventually taken from ShardRegistry is not the same as taken in Refresher::_refreshUntilMatches |
| Comments |
| Comment by Githook User [ 09/Mar/20 ] |
|
Author: {'username': 'renctan', 'name': 'Randolph Tan', 'email': 'randolph@10gen.com'}Message: |
| Comment by Sheeri Cabral (Inactive) [ 06/Feb/20 ] |
|
Kal's simpler suggestion was agreed upon by Kal, Jason and Esha in a triage meeting today. |
| Comment by Kaloian Manassiev [ 11/Feb/19 ] |
|
As mentioned in the description, this locking order violation only happens under ShardRegistry::toBSON (which is only used by the getShardMap command), this is fairly unlikely to hit in practice, but if it did happen it will stall the entire node. Could a potential simpler fix be to just copy the shard names here into a string vector, then release the ShardRegistryData::_mutex and use getShard outside of the lock in order to obtain the shard's connection strings and other data. This could potentially not return all shards if the command is run right when shard membership changes, but so be it - this is an extremely rare condition to warrant a complex solution involving introducing a third mutex. |