[SERVER-39991] Add transactions workloads to failover concurrency suites Created: 06/Mar/19  Updated: 29/Oct/23  Resolved: 22/Apr/19

Status: Closed
Project: Core Server
Component/s: Replication, Sharding, Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.1.11

Type: Task Priority: Major - P3
Reporter: Judah Schvimer Assignee: Jack Mulrow
Resolution: Fixed Votes: 0
Labels: prepare_testing
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-40201 Retrying commit on original router af... Closed
depends on SERVER-40202 TransactionCoordinator doesn't update... Closed
depends on SERVER-39036 Stop pinning stable timestamp behind ... Closed
depends on SERVER-40069 GlobalLockAcquisitionTracker::getGlob... Closed
depends on SERVER-40081 Move session checkout to before comma... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2019-03-25, Sharding 2019-04-08, Sharding 2019-04-22
Participants:

 Description   

The transactions workloads are currently blacklisted from these suites. Unblacklisting them would give us a lot of failover coverage.



 Comments   
Comment by Githook User [ 22/Apr/19 ]

Author:

{'name': 'Jack Mulrow', 'username': 'jsmulrow', 'email': 'jack.mulrow@mongodb.com'}

Message: SERVER-39991 Enable transaction workloads in concurrency sharded stepdown suites
Branch: master
https://github.com/mongodb/mongo/commit/80820aa72be8d244a87b30d3ebceb341aa888b0c

Comment by Jack Mulrow [ 19/Mar/19 ]

After some initial work, it turns out this depends on (at least) several bug fixes:

  1. There is a deadlock during stepdown between the stepdown thread and commands with checked out sessions, which should be fixed by SERVER-40081.
  2. The TransactionCoordinator doesn't update the replica set monitor, so failovers are detected very slowly, which times out the test infrastructure: SERVER-40202.
  3. Retrying prepare sets the client's last opTime to the prepare optime, which fails waitForWriteConcern if it is from a different term which can happen after a failover. This can be fixed by SERVER-40069 or by manually setting the last op for retried prepares to the system last op like was done (but reverted) in SERVER-37886. As noted in SERVER-37886, these can't be done until SERVER-39036 is completed.
  4. Mongos incorrectly retries read-only commits using two-phase commit, which leads to coordinator timeouts: SERVER-40201.
Generated at Thu Feb 08 04:53:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.