[SERVER-66110] Downgrading FCV can cause the active txnNumber on TransactionParticipant to change between session yielding and unyielding Created: 02/May/22  Updated: 29/Oct/23  Resolved: 05/May/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.0-rc5, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Cheahuychou Mao Assignee: Cheahuychou Mao
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0
Sprint: Sharding NYC 2022-05-16
Participants:

 Description   

Consider an external session with latest txnNumber=5 where the txnNumber corresponds to a retryable write that was executed using an internal transaction.

  1. client0 starts downgrading the FCV to 5.0.
  2. Right after the setFCV thread finishes aborting unprepared transactions and waiting for prepared transactions to complete, client1 starts a transaction with txnNumber=6 and runs a write statement inside it. That statement is executed via the transaction API. To hand off the transaction to opCtx created by the API, the original opCtx for the transaction yields the TransactionParticipant. The transaction API checks out the session, executes the write statement and checks the session back in. It doesn’t commit the transaction since the transaction is owned by the external client.
  3. The setFCV thread sets the “txnNum” for the config.transactions entry for the external session to 5 and its “lastWriteTime” to {t: 1, ts: Timestamp(1, 0)}. The direct write to the config.transactions causes the TransactionParticipant for the external session to be invalidated.
  4. The original opCtx for the transaction unyields the TransactionParticipant. Upon checking out the session, it refreshes the TransactionParticipant from disk. After refresh, the active txnNumber on TransactionParticipant is 5 instead of 6. As described in SERVER-66000, setFCV doesn’t set or unset the “state” field of the config.transactions entry for the external session so there are two cases here:
    • [A] If the txnNumber before txnNumber=5 corresponds to a transaction, the refreshed TransactionParticipant would have state “committed” so unstashing would fail with TransactionCommitted here.
    • [B] If the txnNumber before txnNumber=5 corresponds to a retryable write, the refreshed TransactionParticipant would have state “none”. Additionally, unstashing would return early here so it does not fail.

[A] is problematic since TransactionCommitted is not among the errors handled by drivers so the error would be returned to the external client and it is misleading since the transaction has not been committed yet.

[B] shouldn’t cause any issues since when the client sends a command containing additional statements (no startTransaction) or a commit/abortTransaction command, that command would fail with NoSuchTransaction which is a transient transaction error so the drivers would retry the transaction with a higher txnNumber.

Similar issues also exist for internal transactions for retryable writes. However, the steps are as follows:

  1. client0 runs a retryable write statement with txnNumber=6. To hand off the transaction to opCtx created by the API, the original opCtx for the retryable write yields the TransactionParticipant. The transaction API checks out an internal session for the retryable write, executes the write statement in a transaction in that session, commits the transaction and checks the session back in.
  2. client1 runs setFCV: 5.0 to completion.
  3. The opCtx for the retryable write unyields the TransactionCommitted and that either fails with TransactionCommitted or not fail for the same reason (the active txnNumber has changed).


 Comments   
Comment by Githook User [ 05/May/22 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'mao.cheahuychou@gmail.com', 'username': 'cheahuychou'}

Message: SERVER-66110 Downgrading FCV can cause the active txnNumber on TransactionParticipant to change between session yielding and unyielding

(cherry picked from commit 6d06540e287aefa96b8acabb72d5c879d43ad4e9)
Branch: v6.0
https://github.com/mongodb/mongo/commit/c21af7e992ac945f694e469adbd608e3e6081975

Comment by Githook User [ 04/May/22 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'mao.cheahuychou@gmail.com', 'username': 'cheahuychou'}

Message: SERVER-66110 Downgrading FCV can cause the active txnNumber on TransactionParticipant to change between session yielding and unyielding
Branch: master
https://github.com/mongodb/mongo/commit/6d06540e287aefa96b8acabb72d5c879d43ad4e9

Generated at Thu Feb 08 06:04:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.