-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v5.2, v5.0
-
Sharding 2021-12-13, Sharding 2021-12-27
-
23
-
2
Context
The test txn_commit_optimizations_for_read_only_shards.js runs transaction with the coordinateCommitReturnImmediatelyAfterPersistingDecision server parameter enabled.
This means that the commitTransaction command will return early as soon as the _decisionPromise gets emplaced (either successfully or due to an error).
This means that the next test case will be able to start before the TransactionCoordinator is finished with the existing transaction. Which is part of the coverage for this test.
The problem
For certain test cases, the beforeStatements function stops server replication. Meaning that the secondary stops applying oplogs.
This results in the following being possible:
- txnNumber 51 starts. If the secondary falls behind the oplog. Then the opTime for decisionPersisted hasn't been reached yet. But execution continues and the _decisionPromise is emplaced.
- The next test case starts for txnNumber 52 and replication is completely stopped. This results in the existing transaction to be stuck waiting for majority write concern
- The new test case gets stuck waiting for txnNumber 51 to exit the prepared state
- Since the new test case can never finish (because txnNumber 52 is waiting for the previous one to exit the prepared state), replication is never restarted and txnNumber 51 can never finish.
- This will cause the test to hang forever
Since the issue arises form the test stopping replication with the coordinateCommitReturnImmediatelyAfterPersistingDecision flag enabled, this is a test-only problem
Proposed Solution
If either in the cleanUp option available in the failureMode or in the for loop itself, we would wait for the existing transaction to finish before moving on to the next test then this problem wouldn't occur. As then, the new test case wouldn't be able to stop replication before the transaction was finished.
- related to
-
SERVER-48060 Make tests only set server parameter for making transaction coordinator return decision early if the shards are in latest binVersion
- Closed