Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-41776

If primary node fails to step down before shut down, it can leak a TransactionCoordinator which will eventually fail with some unexpected error like CallbackCanceled

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Sharding
    • Operating System:
      ALL
    • Linked BF Score:
      14

      Description

      And the CallbackCanceled error will propagate up to the client.

      This is just weird behavior, not a data corruption issue. The weird behavior is that mongos will not retry the coordinateCommitTransaction command against a different node in the coordinator shard on getting back CallbackCanceled (it's not in the whitelist of retryable errors). Normally if the coordinator shard's primary steps down, mongos would automatically retry coordinateCommitTransaction against the new primary and the client would not have to do this itself.

      Also, I am not sure what error label the client will get back along with CallbackCanceled. However, they can always rerun commitTransaction against mongos.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: