[SERVER-39692] Make graceful MongoS shutdown drain all in-progress transactions Created: 20/Feb/19  Updated: 29/Oct/23  Resolved: 18/Jul/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.2.0-rc5, 4.3.1

Type: Improvement Priority: Major - P3
Reporter: Shane Harvey Assignee: Randolph Tan
Resolution: Fixed Votes: 0
Labels: ShardedTxn:FutureOptimizations, neweng, pm-564
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-37344 Implement recovery token for retrying... Closed
is related to SERVER-3744 --profile should create system.profil... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.2
Sprint: Sharding 2019-04-08, Sharding 2019-07-01, Sharding 2019-07-15, Sharding 2019-07-29
Participants:

 Description   

SERVER-37344 implemented recoveryToken support for recovering the outcome over a sharded transaction when running commitTransaction on a recovery mongos (i.e., mongos which has not seen that transaction and doesn't know the coordinator or participants list).

In the case of aborting the transaction against a recovery mongos, the driver will still include the recoveryToken (SPEC-1279), but there are situations where the recovery token might still not be known, which means parts of the transaction could still remain open for up to the max transaction lifetime, potentially blocking other operations.

Since in such a case, neither the participants nor the coordinator might be known (especially with read-only shard optimizations), the only deterministic way of ensuring that the transaction vestiges have been aborted is to broadcast abortTransaction to all shards in the cluster. However, this is not a scalable solution and it is also a possibility for DOS attack, so instead as part of this ticket we will do the next best thing:

  • Make the graceful MongoS shutdown logic do a best-effort abortTransaction for all in-progress transaction routers. That way we ensure that on maintenance shutdowns we will not leave open transactions.
  • Document the cases where in 4.2 we can leave transactions hanging for a minute and manual recovery steps that operator might be able to take if they want to clear that state before the transactions expire. That would be the case where MongoS hard crashes after having started transaction on a shard, but before any recovery information is returned to the driver.
  • Post-4.2.0 figure out a format for the recovery token, which contains the set of shards, which were involved as part of the transaction so far. The issues to be considered here are around how large that token can get, because shard ids are strings and theoretically, there is a possibility to exceed the BSON max size.


 Comments   
Comment by Githook User [ 18/Jul/19 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-39692 Make mongos shutdown drain all in-progress transactions

(cherry picked from commit 36dc61299993ce6473a4660150bfb25a59afce77)
Branch: v4.2
https://github.com/mongodb/mongo/commit/52fba7897df46f8a52e590b6cc3a05a24aa93bed

Comment by Githook User [ 18/Jul/19 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-39692 Make mongos shutdown drain all in-progress transactions
Branch: master
https://github.com/mongodb/mongo/commit/36dc61299993ce6473a4660150bfb25a59afce77

Comment by Gregory McKeon (Inactive) [ 10/May/19 ]

The work here is for drivers to address the first bullet, and us to address the second bullet in Kal's comment.

Comment by Kaloian Manassiev [ 15/Apr/19 ]

Like Andy describes above, broadcasting abortTransaction to all shards in the cluster is not a scalable solution and it is also a possibility for DOS attack.

I propose that we do the following

  • For backwards compatibility purposes with post-4.2.0 fixes, make the drivers include the recoveryToken for both commit and abortTransaction (shane.harvey opened SPEC-1279)
  • Make the graceful MongoS shutdown logic do a best-effort abortTransaction all in-progress transaction routers. That way we ensure that on maintenance shutdowns we will not leave open transactions (tracked by this ticket)
  • Document the cases where in 4.2 we can leave transactions hanging for a minute. That would be the case where MongoS hard crashes (tracked by this ticket)
  • Post-4.2.0 figure out a format for the recovery token, which contains the set of shards, which were involved as part of the transaction so far. The issues to be considered here are around how large that token can get, because shard ids are strings and theoretically, there is a possibility to exceed the BSON max size.

alyson.cabral, shane.harvey, does this plan sound good to you?

Comment by Esha Maharishi (Inactive) [ 26/Feb/19 ]

I am unlinking this from SERVER-39726, since aborting a transaction through abortTransaction against a recovery router to expedite releasing the transaction's resources across the cluster is unrelated to avoiding blocking while recovering a transaction's decision through commitTransaction against a recovery router.

Comment by Andy Schwerin [ 21/Feb/19 ]

Broadcast is not an acceptable solution for lost mongos. There are a few other reasons we’ve considered updating the recovery token on subsequent transaction statements, so shane.harvey’s proposal is interesting.

Comment by Shane Harvey [ 20/Feb/19 ]

I vaguely recall schwerin and esha.maharishi saying that a broadcast to all shards approach would not be acceptable.

In any case, I've reduced the scope of SPEC-1168 to have drivers only include the recoveryToken on commitTransaction and never on abortTransaction so any changes made for this ticket will need driver changes.

Comment by Randolph Tan [ 20/Feb/19 ]

One way to do this is to have abortTransaction have a mode that makes the mongos broadcast to all shards.

Comment by Shane Harvey [ 20/Feb/19 ]

Right now, my proposal in SPEC-1168 is for drivers to track the most recently seen recoveryToken and send the recoveryToken along with commitTransaction as well as abortTransaction. This allows the server to update the recoveryToken to include the participant list and therefore a recovery mongos can then use the recoveryToken field to abort the transaction.

Note: a recoveryToken that includes new participant(s) could be lost due to a network error so the recovery mongos may leave the transaction open on some participants (unless the abort is broadcasted to all shards) but this is still better than leaving the transaction open on all participants.

Even if this is not implemented, the behavior on running abort on a recovery mongos must be defined. Consider that even if drivers remain pinned to the same mongos, the mongos could restart and loose the in memory transaction state. In this case, the driver sends abort to a recovery mongos without even realizing it.

Generated at Thu Feb 08 04:52:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.