[SERVER-71198] Assert that unkillable operations that take X collection locks do not hold the RSTL Created: 09/Nov/22 Updated: 19/Jul/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||||||||||||||||||
| Sprint: | Execution Team 2022-12-12, Execution Team 2022-12-26, Execution Team 2023-01-09 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Description |
|
The deadlocks described in
In this situation, the operation that isn't interrupted by stepdown does not make progress. We should add an assertion to our lock helpers that prevent unkillable operations from taking X locks while also holding the RSTL. |
| Comments |
| Comment by Jordi Olivares Provencio [ 30/Mar/23 ] |
|
As of |
| Comment by Jordi Olivares Provencio [ 25/Jan/23 ] |
|
In this case, the OplogApplier holds the RSTL lock for applying the operations. It needs it to protect from ReplSet changes. And yes, as a result of applying operations it will inevitably have to hold MODE_X locks for doing DDL operations on collections. As is also important, the OplogApplier threads are not marked as killable and thus fall under the "unkillable system connection" umbrella. The entirety of initial sync machinery also takes the lock and acquires strong MODE_X locks. The biggest problem I found is that oddly enough the method for marking a client connection as unkillable is the inverse. That is, by default all operations are unkillable and are then explicitly marked as killable using Client::canKillSystemOperationInStepdown. Finding all system connections that are unkillable thus becomes impossible to do with a search on the method. Instead the only thing possible that can "find" the clients that are unkillable is to search for calls to ServiceContext::makeClient and looking for calls that don't have a session handle. |
| Comment by Max Hirschhorn [ 24/Jan/23 ] |
Great, tell me more about this! What is the sequence of lock acquisitions during secondary oplog application you're describing? Is it that secondary oplog application takes the RSTL IX lock in an uninterruptible Client and and then also takes a collection X lock when applying certain ops? Is the RSTL lock relevant for how secondary oplog application must run? Or is it that we lack a different mechanism to have the writer threads opt-out of it? I'm fully bought into prepared transactions wouldn't be part of a deadlock on secondaries because secondaries don't hold any locks for prepared transactions. I'd like to understand the lock story more in general because other collection locks may still be taken on a secondary by operations which aren't related to prepared transactions. |
| Comment by Jordi Olivares Provencio [ 23/Jan/23 ] |
|
max.hirschhorn@mongodb.com Secondaries not holding locks means the described deadlock cannot happen in them. In this case it is only Primaries that hold the locks and can deadlock. This is because only primaries need to control for writes, secondaries apply all the writes atomically as they'll receive the full transaction operations by the primary when it commits there. It's as if they perform an applyOps with all the writes. To check for the conditions described in this ticket requires knowing what is the current node's role in the Replica Set since it would be a false positive in secondaries. This is the layering violation that I mention as the locking layer would need to know about replication. |
| Comment by Max Hirschhorn [ 20/Jan/23 ] |
|
Thanks for reopening this ticket. jordi.olivares-provencio@mongodb.com, would you mind adding more detail as to how the replica set member state enters into the picture? My understanding is the following -
What is it about prepared transactions and how they are applied by secondaries which complicates checking the conditions stated in the description? What is it about secondaries not holding locks for prepared transactions which result in a violation of those conditions? |
| Comment by Jordi Olivares Provencio [ 20/Jan/23 ] |
|
Re-opening and putting in the backlog. The patch used in order to investigate this can be found here: diff.patch |
| Comment by Jordi Olivares Provencio [ 20/Jan/23 ] |
|
Adding the considered assertion requires a layering violation during locking. The invariant suggested here requires distinguishing between whether the node is in a primary state or not since prepared transactions will only acquire locks in a primary node. This is a layering violation as the locking component shouldn't have to know anything about the replication layer above it. This is due to the following observation from the Replication readme that explains why it should only be checked in a Primary node (emphasis added): Once stepdown finishes, the node will yield locks from all prepared transactions since secondaries don't hold locks for their transactions. As part of this project we discovered only one potential deadlock in Given all the above, we are closing this ticket as Won't Fix. |