When this deadlock occurs, the MigrationDestinationManager is holding the session checked out in what it calls "outerOpCtx". It then dispatches other threads with other opCtxs to do work on its behalf (in _migrateDriver()). Those opCtxs will not be killed by killSessions, because they do not have the session checked out. So what happens is
outerOpCtx holds session, but is not being used otherwise. In fact, it's not on a thread because an AlternativeClientRegion has been used.
Stepdown kills all user operations and all system operations marked to be killable on stepdown.
_migrateDriver() (either cloneDocuments or _applyMigrateOp) creates a new operation
Stepdown kills all sessions. But now we're stuck – the outerOpCtx doesn't receive the kill because it's swapped out of its thread. The new operation doesn't receive the kill because it's not associated with the session. The new operation gets stuck waiting for the RSTL, the stepdown thread gets stuck waiting for the session to be checked in, and we've got deadlock.
I can see a few ways to fix this. One way would be to officially allow opCtxs to do work on behalf of a session they didn't have checked out; they would then get kills delivered to them (and assigning an opCtx to an already-killed session would auto-kill it). The accounting might get ugly. We could also do something like PrimaryOnlyService does, which is basically the same only "manually" – register each opCtx created during migration somewhere. Then the outerOpCtx, instead of being swapped out, is waiting for a kill. When it gets it, it kills all registered opCtxs.
Or we could have the kill loop in shutdown time out if a session isn't killed in time, and loop back and kill the operations again. This is unelegant and runs the risk of livelock though.