Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-41776

If primary node fails to step down before shut down, it can leak a TransactionCoordinator which will eventually fail with some unexpected error like CallbackCanceled

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Sharding
    • ALL
    • 0

      And the CallbackCanceled error will propagate up to the client.

      This is just weird behavior, not a data corruption issue. The weird behavior is that mongos will not retry the coordinateCommitTransaction command against a different node in the coordinator shard on getting back CallbackCanceled (it's not in the whitelist of retryable errors). Normally if the coordinator shard's primary steps down, mongos would automatically retry coordinateCommitTransaction against the new primary and the client would not have to do this itself.

      Also, I am not sure what error label the client will get back along with CallbackCanceled. However, they can always rerun commitTransaction against mongos.

            Assignee:
            esha.maharishi@mongodb.com Esha Maharishi (Inactive)
            Reporter:
            esha.maharishi@mongodb.com Esha Maharishi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: