Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.4.7, 5.0.0-rc1, 5.1.0-rc0
Affects Version/s: None
Component/s: Replication, Sharding
Labels:
- sharding-wfbf-sprint

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v5.0, v4.4
Linked BF Score:
124
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

When this deadlock occurs, the MigrationDestinationManager is holding the session checked out in what it calls "outerOpCtx". It then dispatches other threads with other opCtxs to do work on its behalf (in _migrateDriver()). Those opCtxs will not be killed by killSessions, because they do not have the session checked out. So what happens is

outerOpCtx holds session, but is not being used otherwise. In fact, it's not on a thread because an AlternativeClientRegion has been used.

Stepdown kills all user operations and all system operations marked to be killable on stepdown.

_migrateDriver() (either cloneDocuments or _applyMigrateOp) creates a new operation

Stepdown kills all sessions. But now we're stuck – the outerOpCtx doesn't receive the kill because it's swapped out of its thread. The new operation doesn't receive the kill because it's not associated with the session. The new operation gets stuck waiting for the RSTL, the stepdown thread gets stuck waiting for the session to be checked in, and we've got deadlock.

I can see a few ways to fix this. One way would be to officially allow opCtxs to do work on behalf of a session they didn't have checked out; they would then get kills delivered to them (and assigning an opCtx to an already-killed session would auto-kill it). The accounting might get ugly. We could also do something like PrimaryOnlyService does, which is basically the same only "manually" – register each opCtx created during migration somewhere. Then the outerOpCtx, instead of being swapped out, is waiting for a kill. When it gets it, it kills all registered opCtxs.

Or we could have the kill loop in shutdown time out if a session isn't killed in time, and loop back and kill the operations again. This is unelegant and runs the risk of livelock though.

related to

SERVER-57709 Make MigrationDestinationManager's inserter thread interruptible on stepdown

Closed

SERVER-57756 Race between concurrent stepdowns and applying transaction oplog entry

Closed

SERVER-60161 Deadlock between config server stepdown and _configsvrRenameCollectionMetadata command

Closed

Assignee:: Pierlauro Sciarelli
Reporter:: Matthew Russotto
Participants:: Esha Maharishi, Githook User, Matthew Russotto, Pierlauro Sciarelli
Votes:: 0 Vote for this issue
Watchers:: 15 Start watching this issue

Created:: Mar 26 2021 08:37:11 PM UTC
Updated:: Oct 29 2023 09:55:41 PM UTC
Resolved:: May 25 2021 11:26:07 AM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates