[SERVER-69523] Allow METADATA and MUTEX locks to be acquired while holding an oplog hole Created: 08/Sep/22 Updated: 29/Oct/23 Resolved: 15/Sep/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.2.0-rc0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Kaloian Manassiev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | PM-2144-Milestone-0 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Sharding EMEA 2022-09-19 | ||||||||
| Participants: | |||||||||
| Description |
|
Locks of type RESOURCE_MUTEX are intended to serve as actual mutexes and no blocking work should be performed while they are held. Sharding already uses them in order to protect its in-memory structures. This ticket is to change this and this invariant to exclude RESOURCE_MUTEX. |
| Comments |
| Comment by Githook User [ 15/Sep/22 ] |
|
Author: {'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}Message: This makes mutex acquisitions both interruptible and allows the lock |
| Comment by Githook User [ 13/Sep/22 ] |
|
Author: {'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}Message: |
| Comment by Kaloian Manassiev [ 09/Sep/22 ] |
|
RESOURCE_MUTEXes are effectively a fancy std::mutex with some benefits such as ability to lock something by name, lock stats tracking and presentation in the locking graphs. Because of this, no blocking work should actually be done under them, and ideally no further locks should be acquired after a RESOURCE_MUTEX is taken (even though, we don't currently obey this, since we are also currently using them as a way to ensure only single kind of DDL operation runs at a time. One of the std::mutex-like usages is the protection of the DSS/CSS state and that runs in the OpObservers, so there is actually an OpLog hole held, but because So yes, there is some hypothetical concern, but it is no different than running a query while holding an std::mutex for example. In addition, I want to point out that the same happens with the RESOURCE_METADATA locks that we use. |
| Comment by Max Hirschhorn [ 08/Sep/22 ] |
|
This sounds prone to stalling replication if RESOURCE_MUTEX is used differently by a component in the server codebase. Is adding an exemption at the level of the RESOURCE_MUTEX resource category the appropriate place? |