[SERVER-41776] If primary node fails to step down before shut down, it can leak a TransactionCoordinator which will eventually fail with some unexpected error like CallbackCanceled Created: 14/Jun/19  Updated: 25/Oct/19  Resolved: 29/Jul/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Esha Maharishi (Inactive)
Resolution: Won't Fix Votes: 0
Labels: ShardedTxn:KnownBugs
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Operating System: ALL
Participants:
Linked BF Score: 0

 Description   

And the CallbackCanceled error will propagate up to the client.

This is just weird behavior, not a data corruption issue. The weird behavior is that mongos will not retry the coordinateCommitTransaction command against a different node in the coordinator shard on getting back CallbackCanceled (it's not in the whitelist of retryable errors). Normally if the coordinator shard's primary steps down, mongos would automatically retry coordinateCommitTransaction against the new primary and the client would not have to do this itself.

Also, I am not sure what error label the client will get back along with CallbackCanceled. However, they can always rerun commitTransaction against mongos.



 Comments   
Comment by Ratika Gandhi [ 29/Jul/19 ]

Marked BF as Trivial.

Generated at Thu Feb 08 04:58:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.