Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 5.3.0, 5.0.6
Affects Version/s: None
Component/s: Sharding
Labels:

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v5.2, v5.0
Sprint:
Sharding 2021-12-13, Sharding 2021-12-27
Linked BF Score:
23
Story Points:
2
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Context
The test txn_commit_optimizations_for_read_only_shards.js runs transaction with the coordinateCommitReturnImmediatelyAfterPersistingDecision server parameter enabled.

This means that the commitTransaction command will return early as soon as the _decisionPromise gets emplaced (either successfully or due to an error).

This means that the next test case will be able to start before the TransactionCoordinator is finished with the existing transaction. Which is part of the coverage for this test.

The problem
For certain test cases, the beforeStatements function stops server replication. Meaning that the secondary stops applying oplogs.

This results in the following being possible:

txnNumber 51 starts. If the secondary falls behind the oplog. Then the opTime for decisionPersisted hasn't been reached yet. But execution continues and the _decisionPromise is emplaced.
The next test case starts for txnNumber 52 and replication is completely stopped. This results in the existing transaction to be stuck waiting for majority write concern
The new test case gets stuck waiting for txnNumber 51 to exit the prepared state
Since the new test case can never finish (because txnNumber 52 is waiting for the previous one to exit the prepared state), replication is never restarted and txnNumber 51 can never finish.
This will cause the test to hang forever

Since the issue arises form the test stopping replication with the coordinateCommitReturnImmediatelyAfterPersistingDecision flag enabled, this is a test-only problem

Proposed Solution
If either in the cleanUp option available in the failureMode or in the for loop itself, we would wait for the existing transaction to finish before moving on to the next test then this problem wouldn't occur. As then, the new test case wouldn't be able to stop replication before the transaction was finished.

related to

SERVER-48060 Make tests only set server parameter for making transaction coordinator return decision early if the shards are in latest binVersion

Closed

Assignee:: Matt Boros
Reporter:: Luis Osta (Inactive)
Participants:: Githook User, Luis Osta, Matt Boros, Max Hirschhorn
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Oct 12 2021 04:47:05 PM UTC
Updated:: Oct 29 2023 09:47:30 PM UTC
Resolved:: Dec 29 2021 06:39:43 PM UTC
Confidence Status Last Update:: 02/Dec/21 3:54 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates