[SERVER-38806] Reduce time spent blocking new transaction coordination requests after coordinator failover Created: 02/Jan/19  Updated: 06/Dec/22  Resolved: 24/May/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Matthew Saltz (Inactive) Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Won't Fix Votes: 0
Labels: ShardedTxn:DistributedCommit, ShardedTxn:FutureOptimizations, pm-564
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Sharding
Participants:

 Description   

(Old title: Allow finer grained control of coordinator re-creation on step-up)

On step-up, we launch an asynchronous task to read all documents from the coordinators collection and recreate new TransactionCoordinator objects in memory for each document to continue the commit process for that transaction where it left off. Currently, all operations that access the TransactionCoordinatorCatalog during this time block until this process is complete. This means that all operations that attempt to create a new transaction or to commit an existing transaction will block behind that process, which could be time-consuming since it requires a full collection scan. We should probably benchmark how long that will be expected to take, and if necessary, improve the concurrency mechanism around creating new coordinators on step-up.


Generated at Thu Feb 08 04:50:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.