Currently there can be 3 way deadlock if step up doesn’t wait for index build to complete for the below scenario:
- Node is in secondary and it starts the index build in back background process in runWithoutInterruptionExceptAtGlobalShutdown . Let's say the index build’s plan executor currently yielded the lock.
- Step up will be able to acquire the RSTL lock in mode X and finish completing its step up process.
- Now, the node is in primary and index build would be able to be blocked by prepared txn due to prepare conflict. (Note: On secondaries, we have the prevention mechanism to block transactions from being prepared if the index build is in progress. So, step up/ rollback which takes X lock and doesn’t kill operations wouldn’t get into 3 way deadlock issues).
- Node tries to step down which will be blocked behind index. As the step down couldn’t kill index build for 2 reasons 1) connection is internal 2) index build is running w/ interrupt guard (runWithoutInterruptionExceptAtGlobalShutdown).
- CommitTransaction cmd is waiting for RSTL lock to acquire in IX mode but blocked behind the step down thread.
This ticket has to implement such that step up will wait for background process (index build) to complete after it has acquired the RSTL lock in X mode but released the repl mutex lock.( like we do it in rollback).