[SERVER-49369] add check for Global resource lock state to OperationContext::waitForConditionOrInterrupt() Created: 08/Jul/20 Updated: 07/Jun/22 Resolved: 07/Jan/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Benety Goh | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | techdebt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Sprint: | Execution Team 2021-11-01, Execution Team 2021-11-15, Execution Team 2021-12-27, Execution Team 2022-01-10, Execution Team 2022-01-24 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
It is generally an anti-pattern to be holding a lock on either the Global or RSTL resources when waiting on a condition variable through the.OperationContext. There is currently no programmatic way to enforce this convention, which has led to issues such as We should look into introducing in invariant to enforcing this convention for most usages of OperationContext as a condition variable. There will be a few exceptions to consider - one such case is in the LockManager's internal lock acquisition logic. |
| Comments |
| Comment by Dianna Hohensee (Inactive) [ 07/Jan/22 ] |
|
I'm closing this ticket as Won't Do. I'm not convinced holding a lock across interruptible waits is an anti-pattern. Rather, it's likely not ideal, but there appears to be good reason for the usages. Nothing turned up in Gabriel's extensive investigation seems alarming. Instead of removing such usages, I think we will progress towards an increasingly lock-free system, which helps. Callers of OperationContext::waitForConditionOrInterrupt() will receive interrupts whenever code requires a RSTL or global exclusive write lock – shutdown, repl state change, all enqueue a strong lock, send out interrupt signals to waitForConditionOrInterrupt() callers, and then wait for strong lock acquisition (i.e. all the intent locks to be released). The work in |
| Comment by Benety Goh [ 29/Nov/21 ] |
|
Thanks for the thorough investigation. The google document is very informative and we should keep it around for future reference. dianna.hohensee, I believe that the original ticket in The issues with the interaction between the find command and the JS interpreter for the $where operator can be deferred as we favor aggregation operator over JS script execution for queries. In terms of ticketing, it might be overwhelming to file a slew of tickets that might all get the same treatment at triage time. How about selecting one of the easy wins from the google document for a ticket that we can discuss at the next triage meeting? |
| Comment by Dianna Hohensee (Inactive) [ 24/Nov/21 ] |
|
benety.goh Any thoughts based on what the investigation has turned up? I'm pondering whether we should go ahead with creating tickets for all of these instances, or maybe a subset. Also a bit uncertain whether the focus of this task is to avoid write ticket exhaustion, or avoid waiting on the opCtx while holding any/write/read lock? Or both? |
| Comment by Dianna Hohensee (Inactive) [ 24/Nov/21 ] |
|
Some cases can be / are solved that way, in regards to ticket exhaustion concerns, not particularly the pattern of waiting while holding locks. |
| Comment by Dianna Hohensee (Inactive) [ 24/Nov/21 ] |
|
Gabriel's investigation results can be seen in this google document. A summary of problem points:
This summary does not distinguish whether read or write tickets are being used for these operations – would have to do another pass. |