[SERVER-59916] T{1, 2}Starts{First, Second}AndWins In WriteConflictHelpers Does Not Synchronize Committing Of Failed Transaction Created: 13/Sep/21  Updated: 29/Oct/23  Resolved: 01/Oct/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.2.18, 4.4.10, 5.0.4, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Luis Osta (Inactive) Assignee: Luis Osta (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0, v4.4, v4.2
Participants:
Linked BF Score: 48

 Description   

Summary

The WriteConflictHelpers used by the transaction_write_conflicts.js test do not assure that all transaction it has created are aborted/comitted. This means that its possible for the transaction-related locks to never be released, which in the case of the associated BF called the validate_collections.js hook to hang forever.

This issue only occurs in multi-shard sharded collections as in those cases its possible that a transaction can start in one shard but not another. Causing there to be a difference in which transactions are "present" and able to be aborted/committed.

Context

The validate_collections.js test is a hook that runs after the completion of a running test. In this case transaction_write_conflicts.js.

The latter test utilizes the T1StartsFirstAndWins and T2StartsFirstAndWins to define tests that given two separate operations, assure that the correct one is prevented from taking effect and returns a WriteConflict. These helpers are defined in write_conflicts.js

The Sequence

  1. The test runs the T1StartsFirstAndWins test case under the "multidelete-multiupdate" section . The writeConflictTest helper will create the session used to create the failing transaction
  2. Using the created session/txn information, the provided operation gets executed. Which fails with a WriteConflict error on Shard 1 and causes the transaction to be aborted on shard 1 L-2368. This is the first abortTransaction command Shard 0 receives.
  3. The test helper commits the transaction, which causes the MongoS to send the commit transaction command to the pertinent shards, which fails with NoSuchTransaction as Shard 0 hasn't started the transaction yet. The command expected to fail with NoSuchTransaction because of the abort caused by the WriteConflict. But it also fails because Shard 0 hasn't started the transaction yet.
  4. The failure of the commitTransaction results in MongoS implicitly aborting the transaction to all pertinent shards. This is the second abortTransaction command Shard 0 receives.
  5. Shard 0 starts the transaction L-2402. But since it was started after all of the aborts/commits, it will last until the transactionLifetimeLimitSeconds limit. Which in the test environments is 24 hours.

Proposed Solution
At the end of T1StartsFirstAndWins and T2StartsSecondAndWins. Add the following logic:

  1. Start a new transaction with the session associated with the failing transaction
  2. Send a no-op command to all of the shards (or alternatively all shards that didn't have the WriteConflict error)
  3. Commit the transaction started in step 1.

Relevant Logs

[ShardedClusterFixture:job2:mongos] 2021-08-25T22:33:46.489+0000 D3 TXN      [conn97] 93007dc5-7a2c-456a-8241-92fef7de25c5:0 Implicitly aborting transaction on 2 shard(s) due to error: WriteConflict: Encountered error from localhost:20503 during a transaction :: caused by :: WriteConflict
[ShardedClusterFixture:job2:shard1:primary] 2021-08-25T22:33:46.483+0000 D4 TXN      [conn51] New transaction started with txnNumber: 0 on session with lsid 93007dc5-7a2c-456a-8241-92fef7de25c5
[ShardedClusterFixture:job2:shard1:primary] 2021-08-25T22:33:46.487+0000 I  TXN      [conn51] transaction parameters:{ lsid: { id: UUID("93007dc5-7a2c-456a-8241-92fef7de25c5"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, txnNumber: 0, autocommit: false, readConcern: { level: "snapshot", afterClusterTime: Timestamp(1629930826, 89) } }, readTimestamp:Timestamp(0, 0), terminationCause:aborted timeActiveMicros:4799 timeInactiveMicros:21 numYields:0 locks:{ ReplicationStateTransition: { acquireCount: { w: 6 } }, Global: { acquireCount: { r: 3, w: 2 } }, Database: { acquireCount: { r: 2, w: 2 } }, Collection: { acquireCount: { w: 2 } }, Mutex: { acquireCount: { r: 6 } }, oplog: { acquireCount: { r: 2 } } } storage:{} wasPrepared:0, 4ms
[ShardedClusterFixture:job2:mongos] 2021-08-25T22:33:46.481+0000 D3 TXN      [conn97] 93007dc5-7a2c-456a-8241-92fef7de25c5:0 New transaction started
 [ShardedClusterFixture:job2:mongos] 2021-08-25T22:33:46.489+0000 D3 TXN      [conn97] 93007dc5-7a2c-456a-8241-92fef7de25c5:0 Implicitly aborting transaction on 2 shard(s) due to error: WriteConflict: Encountered error from localhost:20503 during a transaction :: caused by :: WriteConflict
[ShardedClusterFixture:job2:mongos] 2021-08-25T22:33:46.496+0000 I  TXN      [conn97] transaction parameters:{ lsid: { id: UUID("93007dc5-7a2c-456a-8241-92fef7de25c5"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, txnNumber: 0, autocommit: false, readConcern: { afterClusterTime: Timestamp(1629930826, 89) } }, numParticipants:2, terminationCause:aborted, abortCause:WriteConflict, timeActiveMicros:15122, timeInactiveMicros:0, 15ms
[ShardedClusterFixture:job2:mongos] 2021-08-25T22:33:46.512+0000 D3 TXN      [conn97] 93007dc5-7a2c-456a-8241-92fef7de25c5:0 Implicitly aborting transaction on 2 shard(s) due to error: NoSuchTransaction: 93007dc5-7a2c-456a-8241-92fef7de25c5:0 Failed to commit transaction because a previous statement on the transaction participant shard-rs0 was unsuccessful.
[ShardedClusterFixture:job2:shard0:primary] 2021-08-25T22:33:46.517+0000 D4 TXN      [conn94] New transaction started with txnNumber: 0 on session with lsid 93007dc5-7a2c-456a-8241-92fef7de25c5
[ShardedClusterFixture:job2:shard0:primary] 2021-08-25T22:33:46.517+0000 D3 TXN      [conn94] Inserting coordinator 93007dc5-7a2c-456a-8241-92fef7de25c5:0 into in-memory catalog



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 01/Oct/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-59916 Force failing transaction in write_conflicts to end
Branch: v4.2
https://github.com/mongodb/mongo/commit/467c0ddef7fa2009832179646077853f86840d1c

Comment by Githook User [ 01/Oct/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: Revert "SERVER-59916 Force failing transaction in write_conflicts to end"

This reverts commit 65228c9bd0da47c4f56f3b9bb13820f34d2dd9ea.
Branch: v4.2
https://github.com/mongodb/mongo/commit/f461ef000d66adb35194ee47ab7918794d0facf4

Comment by Githook User [ 01/Oct/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-59916 Force failing transaction in write_conflicts to end
Branch: v4.2
https://github.com/mongodb/mongo/commit/65228c9bd0da47c4f56f3b9bb13820f34d2dd9ea

Comment by Githook User [ 30/Sep/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-59916 Force failing transaction in write_conflicts to end
Branch: v5.0
https://github.com/mongodb/mongo/commit/928bb5f95a83e4566af3520cbbf20c385a445d13

Comment by Githook User [ 30/Sep/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-59916 Force failing transaction in write_conflicts to end
Branch: v4.4
https://github.com/mongodb/mongo/commit/530a87be18dd1d86fab4e7bfb7672b930f880cd2

Comment by Githook User [ 30/Sep/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-59916 Force failing transaction in write_conflicts to end
Branch: master
https://github.com/mongodb/mongo/commit/9c15a11d5ca98ed26d107ebfd2196033ed32a65d

Generated at Thu Feb 08 05:48:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.