[SERVER-37344] Implement recovery token for retrying a commit command on a different mongos Created: 27/Sep/18  Updated: 29/Oct/23  Resolved: 17/Dec/18

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.1.7

Type: Task Priority: Major - P3
Reporter: Randolph Tan Assignee: Randolph Tan
Resolution: Fixed Votes: 0
Labels: ShardedTxn:DistributedCommit
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-37440 coordinateCommit should fall back to ... Closed
is depended on by DRIVERS-586 Support sharded transactions recovery... Closed
is depended on by SERVER-37851 Create stepdown resilient fsm workloa... Closed
Duplicate
is duplicated by SERVER-38420 Mongos returns confusing error messag... Closed
Related
related to SERVER-39692 Make graceful MongoS shutdown drain a... Closed
is related to SERVER-39187 Rerunning commitTransaction on a new ... Closed
is related to SERVER-39726 Recovering the state of an uncommitte... Closed
is related to SERVER-39349 Recovering the state of a completed s... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2018-12-17, Sharding 2018-12-31
Participants:

 Description   
  • the router returns the recovery token on the response to the first statement
  • the client can send the recovery token on any later statements, including 'commitTransaction' and 'abortTransaction', but the router will only make use of the recovery token in commitTransaction, and only if the router does not know about the transaction
  • to make use of the recovery token, the router sends 'recoverTransaction' to the shard identified by the token
  • a shard that receives 'recoverTransaction' returns NoSuchTransaction if the shard does not know about the transaction. otherwise, if the decision has been made, returns the decision; if the decision has not been made, decides to abort.


 Comments   
Comment by Kaloian Manassiev [ 13/Feb/19 ]

Yes, it is a safe assumption (and it should be a check perhaps on the driver side) that the recovery token should always be a BSON object.

Comment by Shane Harvey [ 11/Feb/19 ]

renctan can drivers rely on the fact that the recoverToken will always be a bson document and not some other bson type? This assumption would simplify some driver implementations.

Comment by Randolph Tan [ 17/Dec/18 ]

Drivers are going to be testing the recoveryToken against single shard clusters. Would you be able to expand a little more on what the "certain edges cases" are?

shane.harvey My original comment was slightly off, and I revised it to:

single shard transactions don't send coordinateCommit. And the current behavior is that attempting to recover a single shard transaction commit will make it wait for the coordinator to timeout and then check for the commit state of the transaction locally.

Comment by Shane Harvey [ 17/Dec/18 ]

Because of this, this currently doesn't work for single shard transactions on certain edge cases.

Drivers are going to be testing the recoveryToken against single shard clusters. Would you be able to expand a little more on what the "certain edges cases" are?

Comment by Randolph Tan [ 17/Dec/18 ]

Recovery token on response to multistatement transactions from mongos:

{
  ok: 1,
  recoveryToken: {
    shardId: "shard1",
  }
}

To use recovery token on commit:

{
  commitTransaction: 1,
  ...
  recoveryToken: {
    shardId: "shard1",
  }
}

Notes:

  • the recovery token object should be treated as an opaque object and pass it back to the server as is.
  • the recovery token would only exist if the multistatement transaction statement ran successfully without errors.
  • commit recovery is best effort. If coordinateCommit was never sent to the coordinator, the recovery commit will timeout waiting for it.
  • single shard transactions don't send coordinateCommit. And the current behavior is that attempting to recover a single shard transaction commit will make it wait for the coordinator to timeout and then check for the commit state of the transaction locally.
Comment by Githook User [ 17/Dec/18 ]

Author:

{'username': 'renctan', 'email': 'randolph@10gen.com', 'name': 'Randolph Tan'}

Message: SERVER-37344 Implement recovery token for retrying a commit command on a different mongos
Branch: master
https://github.com/mongodb/mongo/commit/c82cee47c3208a75f928ad3c87cc3db9a23b0f38

Generated at Thu Feb 08 04:45:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.