Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- oldshardingemea
- shardingemea-qw

Assigned Teams:

Cluster Scalability
Sprint:
Sharding EMEA 2023-02-06, Sharding EMEA 2023-04-17, Sharding EMEA 2023-05-01, Sharding EMEA 2023-05-15, Sharding EMEA 2023-05-29, Sharding EMEA 2023-06-12
Story Points:
3
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

During mongos shutdown procedure, after waiting for quiesce period we attempt to abort all the outstanding transactions by sending the abortTransaction to the relevant shards. This logic was added in ~~SERVER-39692~~ and does not provide strong guarantees, in fact:

So I would define this as "best effort" approach to abort transactions, but on the other side we use Shard::RetryPolicy::kIdempotent. As a consequence if some shard is unreachable (crashed or already shut down) the mongos will keep retrying sending the abortTransaction command for 15 seconds slowing down the mongos shutdown procedure.

So practically if we don't want to hit this 15 seconds delay when shutting down a cluster we must always ensure the mongos is shut down before the shards.

My proposal is to make this logic truly best effort and use runFireAndForgetCommand to send the abortTransaction command. This will guarantee that if some node is unreachable we won't delay mongos shutdown.

related to

SERVER-73415 Parallelize python test fixture teardown

Closed

Assignee:: Unassigned
Reporter:: Tommaso Tocci
Participants:: Max Hirschhorn, Tommaso Tocci
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Jan 26 2023 01:58:03 PM UTC
Updated:: Dec 18 2024 04:59:04 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates