[SERVER-71443] [Replication] Remove or document instances of UninterruptibleLockGuard Created: 17/Nov/22  Updated: 29/Oct/23  Resolved: 20/Apr/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Yujin Kang Park Assignee: Ali Mir
Resolution: Fixed Votes: 0
Labels: repl-shortlist
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-68868 Remove all instances of Uninterruptib... Blocked
Assigned Teams:
Replication
Backwards Compatibility: Fully Compatible
Sprint: Repl 2023-04-03, Repl 2023-04-17, Repl 2023-05-01
Participants:

 Description   

SERVER-68868 intends to remove UninterruptipleLockGuard uses where possible, from its description:

Uses of UninterruptibleLockGuard indicate places in the code that do not comply with MongoDB's requirement that all operations be interruptible at places where they block to wait for resources. Every one of them is a potential future deadlock, and adds complexity to other parts of the codebase. We should reimplement codepaths that depend on UninterruptibleLockGuard so as to be interruptible.

The work has been split for the different teams in server, this one being for Replication.

SERVER-68867 introduced a linter rule to add friction when adding new UninterruptipleLockGuard instances, and commented existing ones with NOLINT, while also adding a TODO with the corresponding server team ticket if an instance was found to not have a comment explaining why its use is warranted.

We should either add a comment justifying the use of UninterruptibleLockGuard or fix the code to remove its use.
Search for "TODO (SERVER-71443)" in the code:

https://github.com/10gen/mongo/blob/34ac49477b87e183637f68cda828ecff8b393c64/src/mongo/db/repl/oplog_buffer_collection.cpp#L464
https://github.com/10gen/mongo/blob/34ac49477b87e183637f68cda828ecff8b393c64/src/mongo/db/repl/oplog_buffer_collection.cpp#L473
https://github.com/10gen/mongo/blob/34ac49477b87e183637f68cda828ecff8b393c64/src/mongo/db/repl/storage_interface_impl.cpp#L132

The following instances were initially categorised as StorEx, but should probably done by Replication:
https://github.com/10gen/mongo/blob/34ac49477b87e183637f68cda828ecff8b393c64/src/mongo/db/transaction/transaction_participant.cpp#L1603
https://github.com/10gen/mongo/blob/34ac49477b87e183637f68cda828ecff8b393c64/src/mongo/db/transaction/transaction_participant.cpp#L1891



 Comments   
Comment by Githook User [ 19/Apr/23 ]

Author:

{'name': 'Ali Mir', 'email': 'ali.mir@mongodb.com', 'username': 'ali-mir'}

Message: SERVER-71443 Remove or document instances of UninterruptibleLockGuard in replication components
Branch: master
https://github.com/mongodb/mongo/commit/aad09dc4ea618f95b25319a47d14f6ad46947c11

Generated at Thu Feb 08 06:19:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.