Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.1.12
Affects Version/s: None
Component/s: Replication, Storage
Labels:
- prepare_durability
- todo_in_code

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Repl 2019-05-06, Repl 2019-05-20
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We do not kill user or internal reads on step down, similarly to how we do not kill internal writes on step down. Thus if they hit a prepare conflict they will cause the deadlock described in ~~SERVER-40594~~.

Solution:

Stepdown and step up will now require RSTL in mode S.
Rollback requires RSTL in mode X.
Reads take RSTL in mode IS.
Writes take RSTL in mode IX.

Details:

Operations would need to commit to whether they can write when they acquire a lock, but this is acceptable (and is essentially already the contract).
We don’t plan to implement upgrading locks. If something wants to "upgrade" its locks, it must drop all of its locks and reacquire them and make sure that's safe for its own purposes.
Implement this by taking the same lock mode for the RSTL that you take for the global lock.
Advantage that stepdown doesn’t need to wait for reads to complete/yield and yielded readers don’t need to wait for stepdown to complete.
This fixes the problem for any operation that acquires a global IS lock (user or internal) since stepdown will no longer block on the operation to complete.
User operations that acquire global S, IX, or X locks are already killed on stepdown so aren't a problem.
Internal operations that acquire global S, IX, or X locks on user data still must be explicitly killed, so the RangeDeleter, TTL, and any other internal writers to user data must still be audited and fixed.

The S mode acquisition on step up and step down means concurrent state transitions could start happening. To protect against this we will add a new LockManager ResourceMutex that the ReplicationStateTransationLockGuard acquires after acquiring the RSTL, and releases before it. This should be a straightforward way to allow reads and step-ups/step-downs to not conflict (via S and IS locks) but for step-ups and step-downs to conflict even though they take S locks (via the ResourceMutex that does not interact with reads at all).

is depended on by

SERVER-41037 Stepup should kill all user operations(that encounters prepare conflict) before taking RSTL lock in X.

Closed

SERVER-41057 Add non-transactional afterClusterTime find to multi_statement_transaction_atomicity_isolation.js

Closed

is related to

SERVER-40594 Range deleter in prepare conflict retry loop blocks step down

Closed

SERVER-40586 step up instead of stepping down in stepdown suites

Closed

SERVER-40641 Ensure TTL delete in prepare conflict retry loop does not block step down

Closed

SERVER-37988 recover locks on step up at the beginning of the state transition rather than at the end

Closed

SERVER-40487 Stop running the RstlKillOpthread when a node is no longer primary

Backlog

related to

SERVER-41033 set ignore_prepare=true throughout any part of index building that happens in runWithoutInterruption

Closed

SERVER-41034 Invariant if we get a prepare conflict inside runWithoutInterruptionExceptAtGlobalShutdown block.

Closed

SERVER-41035 Rollback should kill all user operations before taking RSTL lock in X.

Closed

SERVER-41036 Make ReadWriteAbility::_canAcceptNonLocalWrites AtomicWord<bool> to prevent torn reads.

Closed

SERVER-42537 Complete TODO listed in SERVER-40700

Closed

(2 is related to, 5 related to)

Assignee:: Pavithra Vetriselvan
Reporter:: Judah Schvimer
Participants:: Githook User, Judah Schvimer, Pavithra Vetriselvan, Suganthi Mani
Votes:: 0 Vote for this issue
Watchers:: 13 Start watching this issue

Created:: Apr 17 2019 08:05:37 PM UTC
Updated:: Oct 29 2023 10:21:54 PM UTC
Resolved:: May 16 2019 02:32:05 PM UTC
Confidence Status Last Update:: 14/May/19 6:11 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates