Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-39624

Reordering of transaction requests from mongos can lead shards to unexpectedly have in-progress transactions

    • Fully Compatible
    • ALL
    • Sharding 2019-03-11
    • 20

      While running a sharded transaction, the router tracks in-memory each shard that has been involved in the transaction. When a new shard is targeted by a statement, the router adds it to the participant list as a pending participant and attaches startTransaction=true and the active transaction's txnId to the next request sent to that shard.

      The shard is considered "pending" because of the shard version version protocol, which means the router won't know the shard was able to satisfy the request sent to it until it gets an OK response. To handle the case where a pending participant returns a stale version error, the router will abort the active txnId on each pending participant (not just the one that returned the error), wait for each to respond, then retry the current statement, possibly targeting a different set of shards. This allows the router to handle these errors within a transaction, instead of returning a transient error and making the client retry with a new txnId.

      To enable this behavior, shards can accept a request with startTransaction=true more than once for a txnId, but only if the shard's local transaction is in the aborted state. This relies on the abort sent to every pending participant before retrying reaching each one after the first requests with startTransaction=true, which is not guaranteed in an asynchronous network. If the abort arrives before the first request on a shard and that shard is targeted by the retry, the retry will be rejected by that shard because it will have an in-progress transaction at that txnId.

            Assignee:
            jack.mulrow@mongodb.com Jack Mulrow
            Reporter:
            jack.mulrow@mongodb.com Jack Mulrow
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: