[SERVER-40700] Deadlock between read prepare conflicts and state transitions Created: 17/Apr/19  Updated: 29/Oct/23  Resolved: 16/May/19

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: None
Fix Version/s: 4.1.12

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Pavithra Vetriselvan
Resolution: Fixed Votes: 0
Labels: prepare_durability, todo_in_code
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-41037 Stepup should kill all user operation... Closed
is depended on by SERVER-41057 Add non-transactional afterClusterTim... Closed
Related
related to SERVER-41033 set ignore_prepare=true throughout an... Closed
related to SERVER-41034 Invariant if we get a prepare conflic... Closed
related to SERVER-41035 Rollback should kill all user operati... Closed
related to SERVER-41036 Make ReadWriteAbility::_canAcceptNonL... Closed
related to SERVER-42537 Complete TODO listed in SERVER-40700 Closed
is related to SERVER-40594 Range deleter in prepare conflict ret... Closed
is related to SERVER-40586 step up instead of stepping down in s... Closed
is related to SERVER-40641 Ensure TTL delete in prepare conflict... Closed
is related to SERVER-37988 recover locks on step up at the begin... Closed
is related to SERVER-40487 Stop running the RstlKillOpthread whe... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2019-05-06, Repl 2019-05-20
Participants:

 Description   

We do not kill user or internal reads on step down, similarly to how we do not kill internal writes on step down. Thus if they hit a prepare conflict they will cause the deadlock described in SERVER-40594.

Solution:

  1. Stepdown and step up will now require RSTL in mode S.
  2. Rollback requires RSTL in mode X.
  3. Reads take RSTL in mode IS.
  4. Writes take RSTL in mode IX.

Details:

  1. Operations would need to commit to whether they can write when they acquire a lock, but this is acceptable (and is essentially already the contract).
  2. We don’t plan to implement upgrading locks. If something wants to "upgrade" its locks, it must drop all of its locks and reacquire them and make sure that's safe for its own purposes.
  3. Implement this by taking the same lock mode for the RSTL that you take for the global lock.
  4. Advantage that stepdown doesn’t need to wait for reads to complete/yield and yielded readers don’t need to wait for stepdown to complete.
  5. This fixes the problem for any operation that acquires a global IS lock (user or internal) since stepdown will no longer block on the operation to complete.
  6. User operations that acquire global S, IX, or X locks are already killed on stepdown so aren't a problem.
  7. Internal operations that acquire global S, IX, or X locks on user data still must be explicitly killed, so the RangeDeleter, TTL, and any other internal writers to user data must still be audited and fixed.

The S mode acquisition on step up and step down means concurrent state transitions could start happening. To protect against this we will add a new LockManager ResourceMutex that the ReplicationStateTransationLockGuard acquires after acquiring the RSTL, and releases before it. This should be a straightforward way to allow reads and step-ups/step-downs to not conflict (via S and IS locks) but for step-ups and step-downs to conflict even though they take S locks (via the ResourceMutex that does not interact with reads at all).



 Comments   
Comment by Githook User [ 16/May/19 ]

Author:

{'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}

Message: SERVER-40700 fix lint again
Branch: master
https://github.com/mongodb/mongo/commit/a07de3fde0e627e5f9b0d71a922cc84b1fce44ef

Comment by Githook User [ 16/May/19 ]

Author:

{'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}

Message: SERVER-40700 fix lint
Branch: master
https://github.com/mongodb/mongo/commit/8763695da8f2f99c122631ca6a874dcaead3971d

Comment by Githook User [ 16/May/19 ]

Author:

{'name': 'Pavi Vetriselvan', 'username': 'pvselvan', 'email': 'pvselvan@umich.edu'}

Message: SERVER-40700 kill read prepare conflicts during step down
Branch: master
https://github.com/mongodb/mongo/commit/953c6116138800e0fbed79e7654eda1690d56f71

Comment by Suganthi Mani [ 07/May/19 ]

We would not implement the new approach described in the "Description" box. Instead, we will be going with the approach where reads using afterClusterTime that encounter prepare conflicts will be killed during step down.

Implementation Detail:

Whenever the read thread hits prepare conflict, it waits either on this conditional variable WiredTigerSessionCache::_prepareCommittedOrAbortedCond or for interrupt.

  • So, at line 93, set “waitsOnPrepareConflict” flag to true indicating that the thread is currently blocked by prepared conflict and that flag can be stored as a new decoration on OperationContext.
  • Resets the flag to false (via makeGuard or OnBlockExit) once waitUntilPreparedUnitOfWorkCommitsOrAborts is finished executing.
  • Make step down’s RstlKillOpthread to kill all user operations  that have flag “waitsOnPrepareConflict” set to true.
Comment by Judah Schvimer [ 06/May/19 ]

One note here, after talking with geert.bosch, is that we can upgrade locks already through lock conversion. It's possible we don't currently do this anywhere, but the lock manager supports converting locks from a weaker mode to a stronger mode. This would cause problems with the original design here, but may not with the new plan.

Comment by Judah Schvimer [ 18/Apr/19 ]

1) With the new approach, both step down and step up will acquire RSTL in S mode? If yes, Could you update the the sentence "Stepdown will now require RSTL in S." in the description.

Yes, done.

This means, read thread(1) and step down thread(2) doesn't conflict each other. So, I feel we don't want to kill the operations that acquire global lock in S mode.

Great point! I agree. I don't think we need to kill operations that acquire the global lock in S mode anymore. Can you clarify why the 3 way deadlock only manifested with a global lock in S mode and not with a collection or DB lock in S mode? It seems to me that the following would be possible:

1. Read acquired RSTL in IX and attempts to acquire collection lock in S mode but blocks behind a prepared transaction holding it.
2. Stepdown blocks on the read to acquire RSTL in mode X.
3. The prepared transaction blocks on the step down to commit or abort.

I think this scenario goes away when the read acquires RSTL in mode S or IS and the stepdown acquires the RSTL in mode S so they do not conflict.

Can you also please clarify why we didn't need to kill S lock holders on step up? I don't think we need to anymore with this new design, but I want to understand why we didn't in the first place.

We can do a single pass by just keeping track of RSTL lock mode in the locker instead of global lock mode.

Can you please remind me why we had the looping behavior in the first place? I looked in SERVER-38511 but cannot find the motivation.

Comment by Suganthi Mani [ 18/Apr/19 ]

judah.schvimer, I am trying to understand the solution and details in the description. And, I have few initial questions regarding that

1) With the new approach, both step down and step up will acquire RSTL in S mode? If yes, Could you update the the sentence "Stepdown will now require RSTL in S." in the description.
2) Currently we are killing global lock in S mode to avoid the below 3 way deadlock

1. Read(DBhash) thread acquired  RSTL lock in IX mode , global lock in S mode and waits for collection/DB lock held by prepared transaction.
2. Step down Enqueued RSTL lock in X mode. And, blocked behind read thread.
3. Prepared transaction waiting for RSTL lock to acquire in IX mode and blocked behind step down thread.

With the new approach, I assume that the thread that takes global lock in S mode would be considered as reads and they would take RSTL in S mode.This means, read thread(1) and step down thread(2) doesn't conflict each other. So, I feel we don't want to kill the operations that acquire global lock in S mode.
3) With the new approach, I also feel the  looping design during step down can be avoided as reads take RSTL in IS and write takes RSTL in IX (equivalent to global lock modes). We can do a single pass by just keeping track of RSTL lock mode in the locker instead of global lock mode. 

CC tess.avitabile

Comment by Judah Schvimer [ 17/Apr/19 ]

Note that step-up doesn't need to kill operations since the only operations that can take place outside of secondary batch application on secondaries are reads that will no longer conflict with the RSTL. We currently kill operations that acquire global locks in mode S, but we may be able to relax that check here. Otherwise we may need to kill those operations on step-up.

Generated at Thu Feb 08 04:55:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.