Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.2.0-rc5, 4.3.1
Affects Version/s: None
Component/s: Sharding
Labels:

Backwards Compatibility:
Fully Compatible
Backport Requested:

v4.2
Sprint:
Sharding 2019-04-08, Sharding 2019-07-01, Sharding 2019-07-15, Sharding 2019-07-29
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

~~SERVER-37344~~ implemented recoveryToken support for recovering the outcome over a sharded transaction when running commitTransaction on a recovery mongos (i.e., mongos which has not seen that transaction and doesn't know the coordinator or participants list).

In the case of aborting the transaction against a recovery mongos, the driver will still include the recoveryToken (SPEC-1279), but there are situations where the recovery token might still not be known, which means parts of the transaction could still remain open for up to the max transaction lifetime, potentially blocking other operations.

Since in such a case, neither the participants nor the coordinator might be known (especially with read-only shard optimizations), the only deterministic way of ensuring that the transaction vestiges have been aborted is to broadcast abortTransaction to all shards in the cluster. However, this is not a scalable solution and it is also a possibility for DOS attack, so instead as part of this ticket we will do the next best thing:

Make the graceful MongoS shutdown logic do a best-effort abortTransaction for all in-progress transaction routers. That way we ensure that on maintenance shutdowns we will not leave open transactions.
Document the cases where in 4.2 we can leave transactions hanging for a minute and manual recovery steps that operator might be able to take if they want to clear that state before the transactions expire. That would be the case where MongoS hard crashes after having started transaction on a shard, but before any recovery information is returned to the driver.
Post-4.2.0 figure out a format for the recovery token, which contains the set of shards, which were involved as part of the transaction so far. The issues to be considered here are around how large that token can get, because shard ids are strings and theoretically, there is a possibility to exceed the BSON max size.

is related to

SERVER-37344 Implement recovery token for retrying a commit command on a different mongos

Closed

SERVER-3744 --profile should create system.profile collections as needed

Closed

Assignee:: Randolph Tan
Reporter:: Shane Harvey
Participants:: Andy Schwerin, Esha Maharishi, Githook User, Gregory McKeon, Kaloian Manassiev, Randolph Tan, Shane Harvey
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Feb 20 2019 07:40:28 PM UTC
Updated:: Oct 29 2023 10:23:49 PM UTC
Resolved:: Jul 18 2019 06:09:36 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates