[SERVER-48641] Deadlock due to the MigrationDestinationManager waiting for write concern with the session checked-out Created: 08/Jun/20  Updated: 29/Oct/23  Resolved: 16/Jul/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.4.0-rc8
Fix Version/s: 4.4.1, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Jack Mulrow
Resolution: Fixed Votes: 0
Labels: KP44
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-47645 Must invalidate all sessions on step ... Closed
Related
related to SERVER-73106 [v4.4] Chunk migration attempts to wa... Closed
is related to SERVER-48689 MigrationDestinationManager waits for... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Sharding 2020-07-13, Sharding 2020-07-27
Participants:
Linked BF Score: 40

 Description   

The MigrationDestinationManager checks-out a session and then proceeds executing the recipient logic while that session is checked-out.

The execution logic at some point reaches to a call to waitForWriteConcern which runs with the session still checked-out.

Because the JournalFlusher wait is non-interruptible (and also because SERVER-40081 prohibits waitForWriteConcern while having a session checked-out), this this causes a three-thread deadlock with the replication coordinator:

  • T1: MigrationDestinationManager has a session checked-out and is waiting on waitForWriteConcern, which in turn is blocked on JournalFlusher::waitForJournalFlush
  • T2: The JournalFlusher is waiting on a MODE_IX RSM lock, which is held in MODE_X by ReplCoord-3
  • T3: ReplCoord-3, while holding the RSM lock in MODE_X, is killing sessions by calling invalidateSessionsForStepdown and this is blocked on the session checked-out by T1


 Comments   
Comment by Githook User [ 12/Aug/20 ]

Author:

{'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}

Message: SERVER-48641 SERVER-48689 Yield session in migration destination driver when waiting on replication and session migration

(cherry picked from commit 21b083c7352704fc8c3d8a4f33c54040259ff766)
Branch: v4.4
https://github.com/mongodb/mongo/commit/91f3ad01c5fe5599d9ba679a659745fa3b7eb00b

Comment by Tess Avitabile (Inactive) [ 27/Jul/20 ]

Great, thank you!

Comment by Esha Maharishi (Inactive) [ 27/Jul/20 ]

kaloian.manassiev yes, they are two different deadlocks that had the same root cause. Both deadlocks should only have existed on 4.4, since they were due to code introduced to the MigrationDestinationManager in 4.4.

Comment by Kaloian Manassiev [ 27/Jul/20 ]

This specific bug is entirely new for 4.4, so it has no effect on 4.2. I don't understand the difference between it and SERVER-48689, except that maybe the latter is different manifestation of the same problem. In either case, neither of the two should be present in 4.2.

esha.maharishi, I think Jack is on vacation - can you confirm my understanding?

Comment by Tess Avitabile (Inactive) [ 27/Jul/20 ]

Does this affect 4.2? We need to backport SERVER-47645 to 4.2.

Comment by Githook User [ 16/Jul/20 ]

Author:

{'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}

Message: SERVER-48641 SERVER-48689 Yield session in migration destination driver when waiting on replication and session migration
Branch: master
https://github.com/mongodb/mongo/commit/21b083c7352704fc8c3d8a4f33c54040259ff766

Comment by Jack Mulrow [ 08/Jul/20 ]

alex.taskov, matthew.saltz, what do you think of the following proposed fix? (Tagging you both since you were on the resumable range deleter project. Also CC esha.maharishi for when she's back from vacation.)

As far as I can tell, the reasons we check out the session for the entire recipient logic is to both detect when a _recvChunkStart begins after a migration has already finished and been cleaned up (due to a split brain) and so when the transaction number on the recipient is advanced as part of recovering a migration, the number can only be advanced before or after all of the recipient logic, so the recovery can safely delete the range deletion document on the recipient and trigger a range deletion (otherwise orphans from an active cloning phase might be inserted after the deletion). Am I missing any reasons?

If that's true, then I think we can fix this problem (and SERVER-48689) by either:

  1. Making the recipient yield the session every place it waits for write concern and when it joins the session migration thread (to fix SERVER-48689) and have it verify the active transaction number has not changed immediately upon checking back out the session (to guarantee no orphans can be inserted after the number is advanced by the recovery). I don't think the session migration can generate orphan documents, so it should be ok for it to run without the migration session checked out.
  2. Only check out the session when writing to config.rangeDeletions here (that's the only recipient initiated write that actually uses the retryable writes machinery from what I can tell) and change migration recovery to wait for the recipient to complete some other way, e.g. sending _recvChunkStatus in a loop.

What do you guys think? I slightly prefer approach 1), since I expect it would be easier to implement, although it might be trickier to test.

Generated at Thu Feb 08 05:17:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.