[SERVER-30932] dbCheck command violates lock ordering by acquiring lock on "local" database first Created: 02/Sep/17 Updated: 09/Dec/21 Resolved: 09/Dec/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Max Hirschhorn | Assignee: | Josef Ahmad |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Sprint: | Execution Team 2021-12-13 | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 0 | ||||||||||||
| Description |
|
The AutoGetDbForDbCheck RAII class first attempts to acquire a lock on the "local" database in MODE_IX and then attempts to acquire a lock on the database to check in MODE_S. This is incompatible with the lock ordering that other database operations use when calling repl::logOp() because other threads will first attempt to acquire a lock on the database and then attempt to acquire a lock on the "local" database.
|
| Comments |
| Comment by Josef Ahmad [ 09/Dec/21 ] |
|
This issue was resolved in |
| Comment by Max Hirschhorn [ 08/Sep/17 ] |
|
geert.bosch and I discussed this issue in-person - I think we're on the same page now that the locking order in the "dbCheck" thread is incorrect. It sounds like this locking order was chosen to avoid an issue when using the MMAPv1 storage engine since the MMAPv1 flush lock is taken the same mode as the first global lock acquisition. We'll likely need to acquire the global lock in an intent mode explicitly in addition to swapping the lock order to work around both issues. |
| Comment by Ian Whalen (Inactive) [ 05/Sep/17 ] |
|
Leaving in Triage for now. Geert to talk to Max about it. |