[SERVER-48030] Fix deadlock with GetShardMap and old RSM Created: 08/May/20  Updated: 29/Oct/23  Resolved: 27/Oct/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.4.2

Type: Bug Priority: Major - P3
Reporter: Lamont Nelson Assignee: Lamont Nelson
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0, v4.4
Sprint: Sharding 2020-05-18, Sharding 2020-07-13, Sharding 2020-06-01, Sharding 2020-06-15, Sharding 2020-06-29, Sharding 2020-07-27, Sharding 2020-08-24, Sharding 2020-10-19, Sharding 2020-11-02
Participants:
Linked BF Score: 24

 Description   

The GetShardMap command is holding the ShardRegistryData _mutex, and trying to obtain the ScanningReplicaSetMonitor::SetState lock via a call to ScanningReplicaSetMonitor::getServerAddress. At the same time replica set monitor is publishing it's onConfirmed set event. It obtains the SetState _mutex, and is trying to obtain the ShardRegistryData _mutex via a call to rebuildShardIfExists.



 Comments   
Comment by Githook User [ 27/Oct/20 ]

Author:

{'name': 'LaMont Nelson', 'email': 'lamont.nelson@mongodb.com', 'username': 'lamontnelson'}

Message: SERVER-48030 Fix deadlock with GetShardMap and old RSM
Branch: v4.4
https://github.com/mongodb/mongo/commit/16186d9f4a5f7dd83af728d3d5c3660420181b38

Comment by Lamont Nelson [ 27/Oct/20 ]

Code review: https://mongodbcr.appspot.com/681080001/

Comment by Lamont Nelson [ 08/May/20 ]

This issue doesn't exist with the new RSM since it doesn't hold it's topology state mutex (in TopologyManager) while publishing the onConfirmedSet event.

Generated at Thu Feb 08 05:15:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.