[SERVER-69523] Allow METADATA and MUTEX locks to be acquired while holding an oplog hole Created: 08/Sep/22  Updated: 29/Oct/23  Resolved: 15/Sep/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.2.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Kaloian Manassiev
Resolution: Fixed Votes: 0
Labels: PM-2144-Milestone-0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-69461 The behaviour of lock acquisitions di... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding EMEA 2022-09-19
Participants:

 Description   

Locks of type RESOURCE_MUTEX are intended to serve as actual mutexes and no blocking work should be performed while they are held. Sharding already uses them in order to protect its in-memory structures.

This ticket is to change this and this invariant to exclude RESOURCE_MUTEX.



 Comments   
Comment by Githook User [ 15/Sep/22 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-69523 Only allow METADATA and MUTEX lock acqisitions with OpContext

This makes mutex acquisitions both interruptible and allows the lock
manager to be able to inspect the lock acqusitions so far.
Branch: master
https://github.com/mongodb/mongo/commit/fa920335ed6a41efa417e5c41940cd28a4a36829

Comment by Githook User [ 13/Sep/22 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-69523 Allow METADATA and MUTEX locks during oplog hole
Branch: master
https://github.com/mongodb/mongo/commit/5a6d6a46b809a2a35099b18e1ee693e70a67f28d

Comment by Kaloian Manassiev [ 09/Sep/22 ]

RESOURCE_MUTEXes are effectively a fancy std::mutex with some benefits such as ability to lock something by name, lock stats tracking and presentation in the locking graphs. Because of this, no blocking work should actually be done under them, and ideally no further locks should be acquired after a RESOURCE_MUTEX is taken (even though, we don't currently obey this, since we are also currently using them as a way to ensure only single kind of DDL operation runs at a time.

One of the std::mutex-like usages is the protection of the DSS/CSS state and that runs in the OpObservers, so there is actually an OpLog hole held, but because SERVER-69461 we are not actually invarianting.

So yes, there is some hypothetical concern, but it is no different than running a query while holding an std::mutex for example.

In addition, I want to point out that the same happens with the RESOURCE_METADATA locks that we use.

Comment by Max Hirschhorn [ 08/Sep/22 ]

This sounds prone to stalling replication if RESOURCE_MUTEX is used differently by a component in the server codebase. Is adding an exemption at the level of the RESOURCE_MUTEX resource category the appropriate place?

Generated at Thu Feb 08 06:13:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.