[SERVER-42699] Server should perform majority write concern no-op write before returning TransientTransactionError label on commitTransaction cmd. Created: 08/Aug/19  Updated: 10/Dec/19  Resolved: 10/Dec/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Suganthi Mani Assignee: Tess Avitabile (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Repl 2019-12-30
Participants:

 Description   

For replica set transaction, we can get into double committing problem for below scenario (see SPEC-1185).

1) The driver is connected to a replica set where node A is primary.
2) The driver sends commitTransaction to A with w:1. A commits the transaction but dies before it can reply. This constitutes a retryable error, which means the driver can retry the commitTransaction command.
3) Node B is briefly elected.
4) The driver retries commitTransaction on B, and B replies with an error label TransientTransactionError (TTE). This implies that the driver may retry the entire transaction. (To be noted, driver always retry the commitTransaction with {w: majority}).
5) Node A revives before B has done any w:majority writes and gets reelected as primary. The driver then retries the entire transaction on A where it commits successfully. This leads to double committing problem.

Currently TTE label is attached to the following error codes.

  1. ErrorCodes::NoSuchTransaction
  2. ErrorCodes::WriteConflict
  3. ErrorCodes::LockTimeout
  4. ErrorCodes::PreparedTransactionInProgress
  5.  Snapshot Error
  6. Retargetting Error - ***Doesn't apply to replica set transaction*** (Correct me if I am wrong).

But, only for ErrorCodes::NoSuchTransaction, we perform no-op writes w/ {w: majority} and check the write concern error to attach TTE label.

Its safe now not to do majority write concern no-op writes for other error codes which can generate TTE label. Because, we currently perform below sequence of steps for commitTransaction cmd (for unprepared transactions/ single replica sets).
Seq #1  If session option doesn't contain "startTransaction", do _continueMultiDocumentTransaction() .
/** TTE error codes from 2- 5 is possible only from here onwards ****/
Seq #2 Else, execute  _beginMultiDocumentTransaction() - starts a new transaction with next higher txn number.
Seq #3 Unstash the transaction lock resources.
Seq #4 Abort the unprepared transaction if seq#3 fails. Else proceed to seq #5.
Seq #5 Run the (commit) command. 
Seq #6  If step 3 fails, then abort the unprepared transaction. Else, stash the transaction lock resources only if they are not committed or aborted.
 

To be noted, commitTxn cmd can't contain 'startTransaction'. So, it should always go to _continueMultiDocumentTransaction().
According to me, there can be only 2 cases
Case #1 Node B has already committed the transaction via secondary oplog application. And, then the driver retries the commitTransaction on node B.
   - This won't even go to unstash step (seq #3) as there is nothing to unstash. So, it's impossible to return TTE error codes. ( I am assuming that the sessions are all single threaded).
Case #2 Node B has not received/applied the commitTransaction oplog entry. And, then the driver retries the commitTransaction on node B.
   - This can only generate NoSuchTransaction Error code.

The system won't be safe, if we get any TTE label error codes from 2 thru 5 (ErrorCodes::WriteConflict to Snapshot error) before seq #1 beginOrContinue() (i.e. _continueMultiDocumentTransaction()).  This means we should perform no-op writes {w:majority} and check for majority write concern error before attaching the TTE label for all TTE error codes. Performing no-op writes always has a performance cost. So, instead we should try to strengthen the contract of TTE label itself.

Note: For Cross-shard transaction, the txn coordinator doesn't care about TTE label attached to the commitTransaction cmd response.



 Comments   
Comment by Suganthi Mani [ 12/Aug/19 ]

My proposal for TTE error label to be safe to retry the entire transaction from beginning is that, before sending TTE label response for commitTransaction cmd, we should make sure we perform one of 2 things:

1) For active txn (i.e. txn number matches the current active txn number), we should abort the transaction implicitly if it's not aborted previously.
2) For inactive txn (i.e. txn number doesn't match the current active txn number), we should make sure the transaction will be rolled back eventually if it was previously committed.

Only for case#2, we would require to perform a majority write concern no-op (only if the write concern supplied with cmd was kMajority). Otherwise, its safe to send the TTE label response w/o no-op write.

Note: Currently in our code, NoSuchTransaction error code is thrown in cases where we don't require no-op write.

CC esha.maharishi shane.harvey
And, tagging siyuan.zhou if anything needed to considered for large transactions > 16MB.

Comment by Judah Schvimer [ 08/Aug/19 ]

suganthi.mani, should we also abort a transaction whenever we return a TTE error label?

Generated at Thu Feb 08 05:01:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.