[SERVER-48030] Fix deadlock with GetShardMap and old RSM Created: 08/May/20 Updated: 29/Oct/23 Resolved: 27/Oct/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 4.4.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Lamont Nelson | Assignee: | Lamont Nelson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v5.0, v4.4
|
||||||||
| Sprint: | Sharding 2020-05-18, Sharding 2020-07-13, Sharding 2020-06-01, Sharding 2020-06-15, Sharding 2020-06-29, Sharding 2020-07-27, Sharding 2020-08-24, Sharding 2020-10-19, Sharding 2020-11-02 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 24 | ||||||||
| Description |
|
The GetShardMap command is holding the ShardRegistryData _mutex, and trying to obtain the ScanningReplicaSetMonitor::SetState lock via a call to ScanningReplicaSetMonitor::getServerAddress. At the same time replica set monitor is publishing it's onConfirmed set event. It obtains the SetState _mutex, and is trying to obtain the ShardRegistryData _mutex via a call to rebuildShardIfExists. |
| Comments |
| Comment by Githook User [ 27/Oct/20 ] |
|
Author: {'name': 'LaMont Nelson', 'email': 'lamont.nelson@mongodb.com', 'username': 'lamontnelson'}Message: |
| Comment by Lamont Nelson [ 27/Oct/20 ] |
|
Code review: https://mongodbcr.appspot.com/681080001/ |
| Comment by Lamont Nelson [ 08/May/20 ] |
|
This issue doesn't exist with the new RSM since it doesn't hold it's topology state mutex (in TopologyManager) while publishing the onConfirmedSet event. |