[SERVER-73915] TransactionCoordinatorService may stall primary step-up from completing when replica set shard steps down and back up quickly Created: 11/Feb/23  Updated: 29/Oct/23  Resolved: 02/Aug/23

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.4.0, 5.0.0, 6.0.0, 6.3.0-rc0
Fix Version/s: 7.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: David Chen (Inactive)
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Sharding NYC
Backwards Compatibility: Minor Change
Operating System: ALL
Participants:
Linked BF Score: 5
Story Points: 3

 Description   

All TransactionCoordinators from the previous term when the node was primary must have exited before a node can finish stepping up as primary. The mechanisms for interrupting TransactionCoordinators involves interrupting active OperationContext and shutting down the txn::AsyncWorkScheduler's TaskExecutor. However the TransactionCoordinator also waits through the WaitForMajorityService and isn't guaranteed to be interrupted. This results in the node completing its member state PRIMARY transition but being unable to exit "drain mode" where the node can accepts writes as primary.

One visible symptom of this behavior is for the following message to be logged every 5 seconds.

[js_test:txn_two_phase_commit_basic] d20040| {"t":{"$date":"2023-02-10T22:40:06.714+00:00"},"s":"I",  "c":"TXN",      "id":22442,   "ctx":"OplogApplier-0","msg":"After 5 seconds of wait there are still sessions left with active coordinators which have not yet completed","attr":{"numSessionsLeft":1}}



 Comments   
Comment by Githook User [ 03/Aug/23 ]

Author:

{'name': 'David Chen', 'email': 'david.chen@mongodb.com', 'username': ''}

Message: SERVER-73915 - Stop TransactionCoordinator from hanging on step up
Branch: minh.luu-no_compile_sys-perf
https://github.com/mongodb/mongo/commit/35633e274e103866c6b49fa41d02f5297ee952d1

Comment by Githook User [ 02/Aug/23 ]

Author:

{'name': 'David Chen', 'email': 'david.chen@mongodb.com', 'username': ''}

Message: SERVER-73915 - Stop TransactionCoordinator from hanging on step up
Branch: master
https://github.com/mongodb/mongo/commit/35633e274e103866c6b49fa41d02f5297ee952d1

Generated at Thu Feb 08 06:25:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.