[SERVER-68868] Remove all instances of UninterruptibleLockGuard Created: 16/Aug/22  Updated: 14/Aug/23

Status: Blocked
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Andy Schwerin Assignee: Backlog - Storage Execution Team
Resolution: Unresolved Votes: 1
Labels: techdebt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-71610 [StorEx] Remove or document instances... Closed
depends on SERVER-68874 Consider making waitAfterPinningCurso... Closed
depends on SERVER-71444 [Sharding] Remove or document instanc... Open
depends on SERVER-68867 Use linter to prevent new instances o... Closed
depends on SERVER-71441 [Query] Remove or document instances ... Closed
depends on SERVER-71443 [Replication] Remove or document inst... Closed
Related
related to SERVER-69506 An InterruptibleLockGuard should be p... Backlog
related to SERVER-68867 Use linter to prevent new instances o... Closed
Assigned Teams:
Storage Execution
Sprint: Execution Team 2022-11-14, Execution Team 2022-12-12, Execution Team 2022-11-28
Participants:

 Description   

Uses of UninterruptibleLockGuard indicate places in the code that do not comply with MongoDB's requirement that all operations be interruptible at places where they block to wait for resources. Every one of them is a potential future deadlock, and adds complexity to other parts of the codebase. We should reimplement codepaths that depend on UninterruptibleLockGuard so as to be interruptible.



 Comments   
Comment by Dianna Hohensee (Inactive) [ 23/Aug/22 ]

We'd like to investigate the uses and file tickets for removal as appropriate for whichever teams make the most sense.

Comment by Kaloian Manassiev [ 17/Aug/22 ]

The other use-case is that I want to lock something by name before it actually exists (namely DB and collections - I am referring to the DSS/CSS maps).

With a mutex I guess I could make one single mutex to cover all the namespaces, but since we perform shard version checks, etc, we don't want a single mutex to be used across all collections and databases because it will become way too hot.

Comment by Andy Schwerin [ 17/Aug/22 ]

We could make certain resource-type locks uninterruptible, I suppose. Are you using resource-mutexes because they offer share modes? Otherwise, would you be using a regular mutex? Maybe a 'non-interruptible, leaf-only resource mutex'?

Comment by Kaloian Manassiev [ 17/Aug/22 ]

What about the usages of ULG for resource-type locks, which are level-0 (i.e., no further locks are taken under them) and which serve the role of a mutex? In sharding specifically, these are necessary in order to perform cleanup in the onRollback handlers.

Perhaps we can make resource-type locks uninterruptible (just like you can't interrupt a mutex) ?

Comment by Dianna Hohensee (Inactive) [ 16/Aug/22 ]

SERVER-27534 added a bunch of uses, and I really can't figure out why from the ticket, and whether the reason still exists.

Comment by Andy Schwerin [ 16/Aug/22 ]

Gregory Noma added a comment - Aug 16 2022 02:18:34 PM EDT

I count 39 non-test usages of UninterruptibleLockGuard.

That's why I filed SERVER-68867. There used to be 5. We're going in the wrong direction

Comment by Gregory Noma [ 16/Aug/22 ]

I count 39 non-test usages of UninterruptibleLockGuard.

Generated at Thu Feb 08 06:11:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.