[SERVER-66719] dbCheck FCV lock upgrade causes deadlock with setFCV Created: 24/May/22  Updated: 29/Oct/23  Resolved: 25/May/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.3.2, 6.0.0-rc8, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Louis Williams Assignee: Gregory Noma
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-66145 Identify and fix locations that write... Closed
related to SERVER-60621 Investigate if we can ban upgrading t... Closed
is related to SERVER-65821 Deadlock during setFCV when there are... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0
Sprint: Execution Team 2022-05-30
Participants:
Linked BF Score: 169

 Description   

dbCheck, which initially only holds IS locks in the global hierarchy (except the FCV lock), upgrades its locks to IX when writing to the oplog.

This causes a deadlock with a concurrent setFCV command and a DDL operation, in this example, dropCollection:

  • dbCheck holds all IS locks in the global hierarchy including the IS collection lock, but it doesn't hold the FCV lock, which we only take in exclusive (IX,X) lock modes.
  • dropCollection holds an IX FCV lock. It waits on an X collection lock behind dbCheck
  • setFCV waits on an S FCV lock behind dropCollection
  • dbCheck tries to take an IX FCV lock behind the setFCV

We support lock upgrades in the lock manager. If the dbCheck operation had taken an IS FCV lock, it would have skipped the queue ahead of the waiting setFCV command, and this deadlock did not happen. This is described in this comment.

This ticket will address the deadlock concern by not upgrading the global locks in dbCheck, since this is generally dangerous.



 Comments   
Comment by Githook User [ 26/May/22 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-66719 Take FCV lock in `MODE_IS` for shared global lock modes

(cherry picked from commit 1c3268ae7fd8ffd678c20d5f2ac977be2a2c982f)
Branch: v5.3
https://github.com/mongodb/mongo/commit/faf655827b3acedf1513bc140e753a5556f2e5a3

Comment by Githook User [ 26/May/22 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-66719 Take FCV lock in `MODE_IS` for shared global lock modes

(cherry picked from commit 1c3268ae7fd8ffd678c20d5f2ac977be2a2c982f)
Branch: v6.0
https://github.com/mongodb/mongo/commit/012db1b7bd4aa161368a28ffdc581ebb102e2fbe

Comment by Githook User [ 25/May/22 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-66719 Take FCV lock in `MODE_IS` for shared global lock modes
Branch: master
https://github.com/mongodb/mongo/commit/1c3268ae7fd8ffd678c20d5f2ac977be2a2c982f

Comment by Louis Williams [ 24/May/22 ]

Since we have to backport this change to a lot of branches, I think we should always take the FCV lock in shared modes. That would fix the deadlock and not require a change to dbCHeck, since likely dbcheck is not the only problem.

Generated at Thu Feb 08 06:06:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.