[SERVER-37881] Coordinator should time out waiting for prepare responses and decide to abort Created: 01/Nov/18  Updated: 29/Oct/23  Resolved: 03/Apr/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.1.10

Type: Task Priority: Major - P3
Reporter: Matthew Saltz (Inactive) Assignee: Kaloian Manassiev
Resolution: Fixed Votes: 0
Labels: ShardedTxn:DistributedCommit, transaction-coordinator-management
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-38522 All the coordinator asynchronous task... Closed
Duplicate
is duplicated by SERVER-36679 Add timer on TransactionCoordinator t... Closed
Related
related to SERVER-60685 TransactionCoordinator may interrupt ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2018-12-31, Sharding 2019-01-14, Sharding 2019-01-28, Sharding 2019-02-11, Sharding 2019-02-25, Sharding 2019-03-11, Sharding 2019-03-25, Sharding 2019-04-08
Participants:

 Description   

There are three stages in the lifetime of a TransactionCoordinator object:

  1. Created, but coordinateCommit command has not yet been received
  2. Prepare was sent, but decision has not yet been made, because no votes have been received from some participants
  3. Decision was made and commit was sent to participants, but confirmation has not yet been received from all

Phases 1 and 2 can be cancelled (timed-out), but phase 3 can not. This ticket is about introducing an upper bound for how long phases 1 and 2 can take before the coordinator unilaterally decides that it must abort.

The upper bound for phases 1 and 2 combined will be the same as the transactionLifetimeLimitSeconds parameter (which defaults to 1 minute). This means that if a commit is not received and/or decision cannot be made for transactionLifetimeLimitSeconds after the transaction has started, that transaction will abort.

If a coordinateCommit command is received with maxTimeMS greater than what is left of transactionLifetimeLimitSeconds since the transaction started, the effective maxTimeMS of the coordinateCommit command will be what is left of transactionLifetimeLimitSeconds.



 Comments   
Comment by Githook User [ 03/Apr/19 ]

Author:

{'name': 'Kaloian Manassiev', 'username': 'kaloianm', 'email': 'kaloian.manassiev@mongodb.com'}

Message: SERVER-37881 Add a deadline for the coordinator preparing a transaction
Branch: master
https://github.com/mongodb/mongo/commit/fecb661b5ebfe1c5e8265db34abc240004d55bf0

Comment by Shane Harvey [ 06/Mar/19 ]

If a coordinateCommit command is received with maxTimeMS greater than what is left of transactionLifetimeLimitSeconds since the transaction started, the effective maxTimeMS of the coordinateCommit command will be what is left of transactionLifetimeLimitSeconds.

I think that makes perfect sense.

Comment by Kaloian Manassiev [ 06/Mar/19 ]

alyson.cabral, shane.harvey: I updated the description of this ticket with what would be the semantics of transactionLifetimeLimitSeconds with respect to distributed transactions. This contradicts a little bit what is in this comment, but I think it would be better to cap the transaction coordinator's upper bound rather than risking large maxTimeMS values causing locks to be held for long time.

Comment by Shane Harvey [ 28/Jan/19 ]

My understanding is that this ticket will also implement this requirement from the cross-shard transactions design:

The maxTimeMS on a user's 'commitTransaction' will override the coordinator's default 'transactionLifetimeLimitSeconds' timeout

Generated at Thu Feb 08 04:47:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.