[SERVER-49369] add check for Global resource lock state to OperationContext::waitForConditionOrInterrupt() Created: 08/Jul/20  Updated: 07/Jun/22  Resolved: 07/Jan/22

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Benety Goh Assignee: Dianna Hohensee (Inactive)
Resolution: Won't Do Votes: 0
Labels: techdebt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-48617 Concurrent 'createIndexes' can block ... Closed
is related to SERVER-67015 Investigate circumstances under which... Closed
Sprint: Execution Team 2021-11-01, Execution Team 2021-11-15, Execution Team 2021-12-27, Execution Team 2022-01-10, Execution Team 2022-01-24
Participants:

 Description   

It is generally an anti-pattern to be holding a lock on either the Global or RSTL resources when waiting on a condition variable through the.OperationContext.

There is currently no programmatic way to enforce this convention, which has led to issues such as SERVER-48617, in IndexBuildsCoordinatorMongod::startIndexBuild().

We should look into introducing in invariant to enforcing this convention for most usages of OperationContext as a condition variable. There will be a few exceptions to consider - one such case is in the LockManager's internal lock acquisition logic.



 Comments   
Comment by Dianna Hohensee (Inactive) [ 07/Jan/22 ]

I'm closing this ticket as Won't Do. I'm not convinced holding a lock across interruptible waits is an anti-pattern. Rather, it's likely not ideal, but there appears to be good reason for the usages. Nothing turned up in Gabriel's extensive investigation seems alarming.

Instead of removing such usages, I think we will progress towards an increasingly lock-free system, which helps. Callers of OperationContext::waitForConditionOrInterrupt() will receive interrupts whenever code requires a RSTL or global exclusive write lock – shutdown, repl state change, all enqueue a strong lock, send out interrupt signals to waitForConditionOrInterrupt() callers, and then wait for strong lock acquisition (i.e. all the intent locks to be released).

The work in SERVER-48617, which spawned this ticket/idea for investigation, ran into different issues around the use of admission control and deadlocking. It does sound like repl and sharding's usage of spurious global IX acquisitions in order to get threads interrupted by stepdown might warrant more consideration, to ensure it's safe.

Comment by Benety Goh [ 29/Nov/21 ]

Thanks for the thorough investigation. The google document is very informative and we should keep it around for future reference.

dianna.hohensee, I believe that the original ticket in SERVER-48617 was trying to avoid the overlapping resource and mutex lock scopes. The write ticket exhaustion (which may be addressed as we rethink admission control/write and read ticketing) was a side effect of the incorrect locking strategy and not really a primary focus.

The issues with the interaction between the find command and the JS interpreter for the $where operator can be deferred as we favor aggregation operator over JS script execution for queries.

In terms of ticketing, it might be overwhelming to file a slew of tickets that might all get the same treatment at triage time. How about selecting one of the easy wins from the google document for a ticket that we can discuss at the next triage meeting?

Comment by Dianna Hohensee (Inactive) [ 24/Nov/21 ]

benety.goh Any thoughts based on what the investigation has turned up? I'm pondering whether we should go ahead with creating tickets for all of these instances, or maybe a subset. Also a bit uncertain whether the focus of this task is to avoid write ticket exhaustion, or avoid waiting on the opCtx while holding any/write/read lock? Or both?

Comment by Dianna Hohensee (Inactive) [ 24/Nov/21 ]

SERVER-60682 was completed recently, to skip taking a write ticket in

  • refreshOplogTruncateAfterPointIfPrimary (JournalFlusher)
  • commit of a prepared transaction.

Some cases can be / are solved that way, in regards to ticket exhaustion concerns, not particularly the pattern of waiting while holding locks.

Comment by Dianna Hohensee (Inactive) [ 24/Nov/21 ]

Gabriel's investigation results can be seen in this google document.

A summary of problem points:

  • $where creating a JS function in the Find cmd path.
  • Data throttling abstractions in the repair and validate command code paths.
  • Something in tenant migrations. Uses Futures, unclear, requires further investigation.
  • Replication state transitions.
  • Server shutdown.
  • refreshOplogTruncateAfterPointIfPrimary rolls it's own sleep logic on MDB generated WCEs due to the validate cmd possibly running WT::verify on the oplog collection and invalidating WT cursors. This can be refactored if needs be.
  • Sharding utilities can decide not to release locks for network calls. Undocumented, further investigation required as to why.
  • Anywhere we await oplog visibility: waitForAllEarlierOplogWritesToBeVisible.
  • Anything using wiredTigerPrepareConflictRetry, which calls waitUntilPreparedUnitOfWorkCommitsOrAborts.
    • WiredTigerRecordStore::findRecord uses a wiredTigerPrepareConflictRetry

This summary does not distinguish whether read or write tickets are being used for these operations – would have to do another pass. SERVER-48617, mentioned as the inspiration for this SERVER ticket, had an issue with write ticket exhaustion, not read.

Generated at Thu Feb 08 05:19:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.