[SERVER-36309] Confirm that if and only if an FSM transaction successfully commits, that all of its writes appear in the database Created: 26/Jul/18  Updated: 29/Oct/23  Resolved: 28/Sep/18

Status: Closed
Project: Core Server
Component/s: Replication, Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.1.4

Type: Task Priority: Major - P3
Reporter: Judah Schvimer Assignee: Janna Golden
Resolution: Fixed Votes: 0
Labels: prepare_testing
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Problem/Incident
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2018-09-24, Sharding 2018-10-08
Participants:
Linked BF Score: 0

 Description   

This was a missing piece of the concurrency testing in 4.0. We ensured that there were no transaction cycles, but not that all transactions that should have committed successfully committed.

We should do this in as many workloads as makes sense.

This will be especially useful in the sharding extension with KillSessions being called concurrently.



 Comments   
Comment by Githook User [ 28/Sep/18 ]

Author:

{'name': 'jannaerin', 'email': 'golden.janna@gmail.com', 'username': 'jannaerin'}

Message: SERVER-36309 Confirm that iff a FSM txn successfully commits that all of its writes appear in the db
Branch: master
https://github.com/mongodb/mongo/commit/fa0e61f4042c325f5ec8903094bbe5fd2aa80c4e

Comment by Githook User [ 22/Sep/18 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: Revert "SERVER-36309 Confirm that iff a FSM txn successfully commits that all of its writes appear in the db"

This reverts commit 8bc9bcbf6e60017b4aab5a1cd4a7b674dc0a574f.
Branch: master
https://github.com/mongodb/mongo/commit/9f54046049edc156b38bacb3d4d7d78b60ab213b

Comment by Githook User [ 21/Sep/18 ]

Author:

{'name': 'jannaerin', 'email': 'golden.janna@gmail.com', 'username': 'jannaerin'}

Message: SERVER-36309 Confirm that iff a FSM txn successfully commits that all of its writes appear in the db
Branch: master
https://github.com/mongodb/mongo/commit/8bc9bcbf6e60017b4aab5a1cd4a7b674dc0a574f

Comment by Judah Schvimer [ 06/Aug/18 ]

what about when we there's a network error due to a stepdown when the mongo shell is running the commitTransaction command?

Retrying commitTransaction should return success or TransactionCommitted if it was successful, or should return NoSuchTransaction otherwise.

I had assumed there may be other error cases that must be retried in order to ensure exactly once semantics which is why I brought up UnknownTransactionCommitResult.

What other error cases were you thinking about? Is retrying commitTransaction not sufficient? The full-transaction retryability problem in the concurrency framework may get in the way of the single-replica set step down suite though with 'prepare'.

Comment by Max Hirschhorn [ 03/Aug/18 ]

I think that if only one client is using each Session, we are guaranteed to know the result of the transaction.

judah.schvimer, I'll agree to you have the potential to learn of the definitive result of the transaction in that case; however, what about when we there's a network error due to a stepdown when the mongo shell is running the commitTransaction command? We had talked about adding a concurrency_replication_with_stepdowns.yml test suite to be able to exercise prepareTransaction without needing sharded transactions to be fully implemented (we already have a concurrency_sharded_with_stepdowns.yml test suite). Based on our documentation for how to handle retries for replica set transactions, I had assumed there may be other error cases that must be retried in order to ensure exactly once semantics which is why I brought up UnknownTransactionCommitResult.

It sounds like we may be able to defer changing the withTxnAndAutoRetry() function as part of this ticket but should probably note whatever changes we'd need to make it in the test plan for prepareTransaction.

Comment by Judah Schvimer [ 03/Aug/18 ]

I think that if only one client is using each Session, we are guaranteed to know the result of the transaction.

Comment by Max Hirschhorn [ 02/Aug/18 ]

This was a missing piece of the concurrency testing in 4.0. We ensured that there were no transaction cycles, but not that all transactions that should have committed successfully committed.

We can achieve this in the multi_statement_transaction_atomicity_isolation.js FSM workload (and thus all the workloads which derive from it) by having each worker thread scribble down in their $config.data the docIds which were updated as part of the transaction.

// Property keys are the .toString() form of the transaction numbers, values are list of documents
// by their _id which were updated.
{
    "NumberLong(0)": [0, 1, 2],
    // Transaction #1 was aborted for this thread and is therefore omitted.
    "NumberLong(1)": [0, 2, 3],
    ...
}

1. Use the DriverSession#getTxnNumber_forTesting() method which was added as part of SERVER-36372 to get the transaction number and record it in the metadata property along with the database and collection name.

2. Change the checkConsistency() state function to assert that the following properties hold on the documents array:

  • For every transaction number present in the data that was scribbled into $config.data, only the documents matching the _id's in the associated array have an entry for that combination of this.tid and txnNumber.
  • For every transaction number not present in the data that was scribbled into $config.data, none of the documents in documents have an entry for that combination of this.tid and txnNumber.

judah.schvimer, could you review the withTxnAndAutoRetry() function to verify whether it is guaranteed to learn the definite result for whether the transaction committed or not? I'm concerned that because the implementation doesn't separately retry on a UnknownTransactionCommitResult error response, it is possible for the transaction to have committed and for the client to not realize it. I believe that without addressing this, asserting that the second property holds isn't feasible.

Generated at Thu Feb 08 04:42:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.