[SERVER-60682] TransactionCoordinator may block acquiring WiredTiger write ticket to persist its decision, prolonging transactions being in the prepared state Created: 13/Oct/21 Updated: 07/Nov/23 Resolved: 17/Nov/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Concurrency, Sharding |
| Affects Version/s: | 4.2.0, 4.4.0, 5.0.0, 5.1.0-rc0 |
| Fix Version/s: | 5.2.0, 5.1.2, 5.0.6, 4.4.11, 4.2.19 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Max Hirschhorn | Assignee: | Josef Ahmad |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Backport Requested: |
v5.1, v5.0, v4.4, v4.2
|
||||||||||||||||||||||||||||
| Sprint: | Execution Team 2021-11-29 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||
| Description |
|
The TransactionCoordinator performs an update to the config.transaction_coordinators on the local shard to write down its commit or abort decision for the cross-shard transaction. During this step of the two-phase commit coordination, the cross-shard transaction is in the prepared state on the participant shards. This means other multi-statement transactions can hit a prepare conflict while waiting for the former cross-shard transaction to commit or abort. These other multi-statement transactions will block while holding storage resources, including a WiredTiger write ticket. It is therefore possible for all WiredTiger write tickets in the system to be temporarily exhausted due to a prepare conflict. It would be less disruptive to the system if the TransactionCoordinator could still write down its decision locally in this situation so that it can more rapidly deliver the decision to the participant shards and clear their prepared state. Note that after transactionLifetimeLimitSeconds have elapsed (defaults to 1 minute), the multi-statement transactions holding the WiredTiger write tickets will be aborted and will release their ticket and enable the TransactionCoordinator to successfully acquire it. |
| Comments |
| Comment by Githook User [ 21/Dec/21 ] | ||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'}Message: Co-authored-by: Louis Williams <louis.williams@mongodb.com> | ||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 10/Dec/21 ] | ||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'}Message: Co-authored-by: Max Hirschhorn max.hirschhorn@mongodb.com | ||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 07/Dec/21 ] | ||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'}Message: Co-authored-by: Max Hirschhorn max.hirschhorn@mongodb.com | ||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 06/Dec/21 ] | ||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'}Message: Co-authored-by: Max Hirschhorn max.hirschhorn@mongodb.com | ||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 17/Nov/21 ] | ||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'}Message: Co-authored-by: Max Hirschhorn max.hirschhorn@mongodb.com | ||||||||||||||||||||||||||||||||||||
| Comment by Max Hirschhorn [ 14/Oct/21 ] | ||||||||||||||||||||||||||||||||||||
|
It looks like it does need to go back to the Storage Execution team because waitForMajorityWithHangFailpoint() after writing the decision also ends up blocking due to the JournalFlusher attempting to acquire a WiredTiger write ticket.
| ||||||||||||||||||||||||||||||||||||
| Comment by Max Hirschhorn [ 13/Oct/21 ] | ||||||||||||||||||||||||||||||||||||
|
Edit: Actually it looks like SkipTicketAcquisitionForLock can be used to do this already. |