-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication, Storage
-
Fully Compatible
-
ALL
-
Repl 2019-05-06, Repl 2019-05-20
We do not kill user or internal reads on step down, similarly to how we do not kill internal writes on step down. Thus if they hit a prepare conflict they will cause the deadlock described in SERVER-40594.
Solution:
- Stepdown and step up will now require RSTL in mode S.
- Rollback requires RSTL in mode X.
- Reads take RSTL in mode IS.
- Writes take RSTL in mode IX.
Details:
- Operations would need to commit to whether they can write when they acquire a lock, but this is acceptable (and is essentially already the contract).
- We don’t plan to implement upgrading locks. If something wants to "upgrade" its locks, it must drop all of its locks and reacquire them and make sure that's safe for its own purposes.
- Implement this by taking the same lock mode for the RSTL that you take for the global lock.
- Advantage that stepdown doesn’t need to wait for reads to complete/yield and yielded readers don’t need to wait for stepdown to complete.
- This fixes the problem for any operation that acquires a global IS lock (user or internal) since stepdown will no longer block on the operation to complete.
- User operations that acquire global S, IX, or X locks are already killed on stepdown so aren't a problem.
- Internal operations that acquire global S, IX, or X locks on user data still must be explicitly killed, so the RangeDeleter, TTL, and any other internal writers to user data must still be audited and fixed.
The S mode acquisition on step up and step down means concurrent state transitions could start happening. To protect against this we will add a new LockManager ResourceMutex that the ReplicationStateTransationLockGuard acquires after acquiring the RSTL, and releases before it. This should be a straightforward way to allow reads and step-ups/step-downs to not conflict (via S and IS locks) but for step-ups and step-downs to conflict even though they take S locks (via the ResourceMutex that does not interact with reads at all).
- is depended on by
-
SERVER-41037 Stepup should kill all user operations(that encounters prepare conflict) before taking RSTL lock in X.
- Closed
-
SERVER-41057 Add non-transactional afterClusterTime find to multi_statement_transaction_atomicity_isolation.js
- Closed
- is related to
-
SERVER-40594 Range deleter in prepare conflict retry loop blocks step down
- Closed
-
SERVER-40586 step up instead of stepping down in stepdown suites
- Closed
-
SERVER-40641 Ensure TTL delete in prepare conflict retry loop does not block step down
- Closed
-
SERVER-37988 recover locks on step up at the beginning of the state transition rather than at the end
- Closed
-
SERVER-40487 Stop running the RstlKillOpthread when a node is no longer primary
- Backlog
- related to
-
SERVER-41033 set ignore_prepare=true throughout any part of index building that happens in runWithoutInterruption
- Closed
-
SERVER-41034 Invariant if we get a prepare conflict inside runWithoutInterruptionExceptAtGlobalShutdown block.
- Closed
-
SERVER-41035 Rollback should kill all user operations before taking RSTL lock in X.
- Closed
-
SERVER-41036 Make ReadWriteAbility::_canAcceptNonLocalWrites AtomicWord<bool> to prevent torn reads.
- Closed
-
SERVER-42537 Complete TODO listed in SERVER-40700
- Closed