[SERVER-41776] If primary node fails to step down before shut down, it can leak a TransactionCoordinator which will eventually fail with some unexpected error like CallbackCanceled Created: 14/Jun/19 Updated: 25/Oct/19 Resolved: 29/Jul/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Esha Maharishi (Inactive) | Assignee: | Esha Maharishi (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | ShardedTxn:KnownBugs | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Operating System: | ALL | ||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
|
And the CallbackCanceled error will propagate up to the client. This is just weird behavior, not a data corruption issue. The weird behavior is that mongos will not retry the coordinateCommitTransaction command against a different node in the coordinator shard on getting back CallbackCanceled (it's not in the whitelist of retryable errors). Normally if the coordinator shard's primary steps down, mongos would automatically retry coordinateCommitTransaction against the new primary and the client would not have to do this itself. Also, I am not sure what error label the client will get back along with CallbackCanceled. However, they can always rerun commitTransaction against mongos. |
| Comments |
| Comment by Ratika Gandhi [ 29/Jul/19 ] |
|
Marked BF as Trivial. |