[SERVER-56523] Avoid locking the ReplicaSetMonitorManager's mutex when garbage collection ReplicaSetMonitors Created: 30/Apr/21  Updated: 29/Oct/23  Resolved: 21/May/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.0.0-rc0, 5.0.0-rc1

Type: Bug Priority: Major - P3
Reporter: Blake Oler Assignee: Andrew Shuvalov (Inactive)
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0, v4.9
Participants:
Linked BF Score: 106

 Description   

The ReplicaSetMonitorManager's mutex has a hierarchical locking level of 6. This means that it must be acquired before any lower-numbered mutex (on the same thread). This mutex is locked every time that a ReplicaSetMonitor is destructed. Enough uses of mutexes at levels 6 and below exist that it should be re-evaluated whether it's safe to lock a level 6 mutex in such a common code path.



 Comments   
Comment by Githook User [ 27/May/21 ]

Author:

{'name': 'Andrew Shuvalov', 'email': 'andrew.shuvalov@mongodb.com', 'username': 'shuvalov-mdb'}

Message: SERVER-56523: potential deadlock - avoid locking the ReplicaSetMonitorManager mutex when garbage collecting RSM
Branch: v5.0
https://github.com/mongodb/mongo/commit/1cb914fc0f6f14a17b61358db1d185703e718162

Comment by Andrew Shuvalov (Inactive) [ 21/May/21 ]

Requesting backports.

Comment by Githook User [ 21/May/21 ]

Author:

{'name': 'Andrew Shuvalov', 'email': 'andrew.shuvalov@mongodb.com', 'username': 'shuvalov-mdb'}

Message: SERVER-56523: potential deadlock - avoid locking the ReplicaSetMonitorManager mutex when garbage collecting RSM
Branch: master
https://github.com/mongodb/mongo/commit/f25468750488f8445794805873df5de456d9c557

Comment by Andrew Shuvalov (Inactive) [ 14/May/21 ]

The way to solve this is to add one more level 1 mutex, which will guard a "pending" garbage removal. To add pending removal lock only the level 1 mutex, no deadlock. To retrieve data from the cache, first lock lvl 6 mutex, then nested lvl 1 mutex, then check for pending garbage removal and perform it if necessary. Then return the result while still holding both mutexes in proper order.

Please contact me for more details.

Generated at Thu Feb 08 05:39:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.