[SERVER-56523] Avoid locking the ReplicaSetMonitorManager's mutex when garbage collection ReplicaSetMonitors Created: 30/Apr/21 Updated: 29/Oct/23 Resolved: 21/May/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.0-rc0, 5.0.0-rc1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Blake Oler | Assignee: | Andrew Shuvalov (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v5.0, v4.9
|
||||||||
| Participants: | |||||||||
| Linked BF Score: | 106 | ||||||||
| Description |
|
The ReplicaSetMonitorManager's mutex has a hierarchical locking level of 6. This means that it must be acquired before any lower-numbered mutex (on the same thread). This mutex is locked every time that a ReplicaSetMonitor is destructed. Enough uses of mutexes at levels 6 and below exist that it should be re-evaluated whether it's safe to lock a level 6 mutex in such a common code path. |
| Comments |
| Comment by Githook User [ 27/May/21 ] |
|
Author: {'name': 'Andrew Shuvalov', 'email': 'andrew.shuvalov@mongodb.com', 'username': 'shuvalov-mdb'}Message: |
| Comment by Andrew Shuvalov (Inactive) [ 21/May/21 ] |
|
Requesting backports. |
| Comment by Githook User [ 21/May/21 ] |
|
Author: {'name': 'Andrew Shuvalov', 'email': 'andrew.shuvalov@mongodb.com', 'username': 'shuvalov-mdb'}Message: |
| Comment by Andrew Shuvalov (Inactive) [ 14/May/21 ] |
|
The way to solve this is to add one more level 1 mutex, which will guard a "pending" garbage removal. To add pending removal lock only the level 1 mutex, no deadlock. To retrieve data from the cache, first lock lvl 6 mutex, then nested lvl 1 mutex, then check for pending garbage removal and perform it if necessary. Then return the result while still holding both mutexes in proper order. Please contact me for more details. |