Details
-
Bug
-
Resolution: Fixed
-
Major - P3
-
None
-
None
-
None
-
Fully Compatible
-
ALL
-
Sharding 2022-01-24, Sharding 2022-02-07
Description
There are two known bugs related to resuming prepared retryable internal transactions after failover:
- When a new primary steps up, it resumes all prepared transactions without doing a refresh. So for a retryable internal transaction, the TransactionParticipant will have an empty p().activeTxnCommittedStatements after the transaction commits since the map is populated by onPreparedTransactionCommit() which doesn’t run on secondaries, plus secondaries don’t do addTransactionOperation() while applying the applyOps oplog entries for transactions. As a result, any retries with/without internal transactions will cause the write statements that were executed in that transaction to re-execute.
- When a new primary steps up, if there is a prepared retryable internal transaction, the node will hang in the step for refreshing the locks. The reason is that when it checks out the internal session, it will try to refresh the parent session (new behavior introduced in
SERVER-62020) and hang because it cannot acquire the global IS lock with this side opCtx because the main opCtx is holding the RSTL lock.