[SERVER-66847] txn_single_write_shard_failover.js should wait for primary re-election before retrying commitTransaction Created: 27/May/22  Updated: 12/Dec/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Rachita Dhawan Assignee: Backlog - Cluster Scalability
Resolution: Unresolved Votes: 0
Labels: neweng, sharding-nyc-subteam2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image-2022-05-27-14-05-17-859.png     PNG File image-2022-05-27-14-06-52-343.png    
Issue Links:
Depends
Assigned Teams:
Cluster Scalability
Operating System: ALL
Participants:
Linked BF Score: 4
Story Points: 3

 Description   

This test runs a single-write-shard transaction which commits, but for which the client retries commit and a read-only shard fails over before the second commit attempt.
 
The issue here is the readOnly shard has 2 replicas, the test steps down the primary and doesn't ensure that a primary is re-elected before reattempting a second commit. This specific scenario where the Replica Set hasn't elected a primary yet throws a 'Could find host matching Primary state'.
 
Suggested fix: Ensure a primary re-election occurs before re-attempting the re-commit. And ensure that secondary is elected as primary now.

 

 

jsTest.log("Induce a failover on the read shard.");
assert.commandWorked(st.rs0.getPrimary().adminCommand({replSetStepDown: 60, force: true}));
jsTest.log("Make second attempt to commit, should still return that the transaction committed");
assert.commandWorked(session.commitTransaction_forTesting());

 
 


Generated at Thu Feb 08 06:06:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.