[SERVER-66719] dbCheck FCV lock upgrade causes deadlock with setFCV Created: 24/May/22 Updated: 29/Oct/23 Resolved: 25/May/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.3.2, 6.0.0-rc8, 6.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Gregory Noma |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v6.0
|
||||||||||||||||||||||||
| Sprint: | Execution Team 2022-05-30 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 169 | ||||||||||||||||||||||||
| Description |
|
dbCheck, which initially only holds IS locks in the global hierarchy (except the FCV lock), upgrades its locks to IX when writing to the oplog. This causes a deadlock with a concurrent setFCV command and a DDL operation, in this example, dropCollection:
We support lock upgrades in the lock manager. If the dbCheck operation had taken an IS FCV lock, it would have skipped the queue ahead of the waiting setFCV command, and this deadlock did not happen. This is described in this comment. This ticket will address the deadlock concern by not upgrading the global locks in dbCheck, since this is generally dangerous. |
| Comments |
| Comment by Githook User [ 26/May/22 ] |
|
Author: {'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}Message: (cherry picked from commit 1c3268ae7fd8ffd678c20d5f2ac977be2a2c982f) |
| Comment by Githook User [ 26/May/22 ] |
|
Author: {'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}Message: (cherry picked from commit 1c3268ae7fd8ffd678c20d5f2ac977be2a2c982f) |
| Comment by Githook User [ 25/May/22 ] |
|
Author: {'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}Message: |
| Comment by Louis Williams [ 24/May/22 ] |
|
Since we have to backport this change to a lot of branches, I think we should always take the FCV lock in shared modes. That would fix the deadlock and not require a change to dbCHeck, since likely dbcheck is not the only problem. |