[SERVER-37881] Coordinator should time out waiting for prepare responses and decide to abort Created: 01/Nov/18 Updated: 29/Oct/23 Resolved: 03/Apr/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.10 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Matthew Saltz (Inactive) | Assignee: | Kaloian Manassiev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ShardedTxn:DistributedCommit, transaction-coordinator-management | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Sprint: | Sharding 2018-12-31, Sharding 2019-01-14, Sharding 2019-01-28, Sharding 2019-02-11, Sharding 2019-02-25, Sharding 2019-03-11, Sharding 2019-03-25, Sharding 2019-04-08 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
There are three stages in the lifetime of a TransactionCoordinator object:
Phases 1 and 2 can be cancelled (timed-out), but phase 3 can not. This ticket is about introducing an upper bound for how long phases 1 and 2 can take before the coordinator unilaterally decides that it must abort. The upper bound for phases 1 and 2 combined will be the same as the transactionLifetimeLimitSeconds parameter (which defaults to 1 minute). This means that if a commit is not received and/or decision cannot be made for transactionLifetimeLimitSeconds after the transaction has started, that transaction will abort. If a coordinateCommit command is received with maxTimeMS greater than what is left of transactionLifetimeLimitSeconds since the transaction started, the effective maxTimeMS of the coordinateCommit command will be what is left of transactionLifetimeLimitSeconds. |
| Comments |
| Comment by Githook User [ 03/Apr/19 ] |
|
Author: {'name': 'Kaloian Manassiev', 'username': 'kaloianm', 'email': 'kaloian.manassiev@mongodb.com'}Message: |
| Comment by Shane Harvey [ 06/Mar/19 ] |
I think that makes perfect sense. |
| Comment by Kaloian Manassiev [ 06/Mar/19 ] |
|
alyson.cabral, shane.harvey: I updated the description of this ticket with what would be the semantics of transactionLifetimeLimitSeconds with respect to distributed transactions. This contradicts a little bit what is in this comment, but I think it would be better to cap the transaction coordinator's upper bound rather than risking large maxTimeMS values causing locks to be held for long time. |
| Comment by Shane Harvey [ 28/Jan/19 ] |
|
My understanding is that this ticket will also implement this requirement from the cross-shard transactions design:
|