Any thread which takes the PBWM and a ticket has the potential to introduce a three-way deadlock in the presence of prepared transactions.
The deadlock can be summarised like this:
- T1 (Regular secondary read thread): N of them (where N == number of available tickets) acquire tickets, but end up waiting on a prepared transaction to commit. This transaction's commit happens to be in a subsequent batch, therefore this thread will not be unblocked until Oplog application makes forward progress.
- T2 (Internal thread): Takes the PBWM in some intent mode (IS) and blocks waiting on a ticket acquisition.
- T3 (Oplog application thread): Tries to take the PBWM in mode X, but blocks behind T2, which holds the PBWM.
We now have a wait cycle of (T3) -> (T2) -> (T1) -> (T3).
This demonstrates that it is not safe to block on a ticket acquisition while holding the PBWM and there are at least two places where this can happen:
- ShardFilteringMetadataRefresh: Takes both PBWM and a ticket
- ShardServerCatalogCacheLoader: Takes both PBWM and a ticket
This (more general) ticket is for the StorEx team to tighten up the locking rules in order to prevent deadlocks like these from being introduced. One possibility is to add an invariant that we can't take both PBWM and a ticket - a thread must choose one.
- related to
-
SERVER-76835 Deadlock in the shard filtering metadata refresh path
- Closed
-
SERVER-83786 Tighten the guarantee that flow control and ticket acquisition doesn't lead to circular waits
- Backlog
-
SERVER-75262 Add a passthrough test that exercises ticket exhaustion
- Closed