[SERVER-36853] Coordinator should resume coordinating commit for unfinished transactions on stepup Created: 24/Aug/18  Updated: 29/Oct/23  Resolved: 11/Dec/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.1.7

Type: Task Priority: Major - P3
Reporter: Matthew Saltz (Inactive) Assignee: Esha Maharishi (Inactive)
Resolution: Fixed Votes: 0
Labels: ShardedTxn:DistributedCommit, transaction-coordinator-management
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-36585 Make TransactionCoordinator write dec... Closed
is duplicated by SERVER-37883 Handle failures to make participant l... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2018-10-22, Sharding 2018-11-05, Sharding 2018-11-19, Sharding 2018-12-17
Participants:
Linked BF Score: 0

 Description   

Server changes:

  1. On step-up,
    1. Wait for the coordinator catalog to drain.
    2. Scan the coordinators table and for each document, if a corresponding TransactionCoordinator does not exist in the TransactionCoordinatorCatalog, create a TransactionCoordinator from the document, insert it into the catalog, and schedule a task to continue driving it to completion.
  2. Prevent external requests from reading the TransactionCoordinatorCatalog after stepup until the catalog has been populated from the durable state

If this ticket is completed before "prepare" is ready for failover testing, additional server changes:

  • Add an override to mongos to send coordinateCommit to the config server
  • Add an override to the config server to not remove TransactionCoordinators from the catalog on finishing coordinating a commit (because a coordinateCommit retry from mongos will not be able to recover the decision from a local participant)

These overrides should be removed under SERVER-37886.

Testing

  • Same node steps up after stepping down
  • Different node steps up after stepping down


 Comments   
Comment by Githook User [ 12/Dec/18 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-36853 Explicitly detach the futures chain in order to avoid unuchecked return value warning
Branch: master
https://github.com/mongodb/mongo/commit/76c82f567ff8783bd37fa1a34d62cecf0a367599

Comment by Githook User [ 12/Dec/18 ]

Author:

{'name': 'Esha Maharishi', 'email': 'esha.maharishi@mongodb.com', 'username': 'EshaMaharishi'}

Message: SERVER-36853 coordinateCommitTransaction should set the Client's last OpTime to the system last OpTime before returning a decision
Branch: master
https://github.com/mongodb/mongo/commit/3b7720f9a4209e9b4b18f7d8e98f29b574775f76

Comment by Githook User [ 11/Dec/18 ]

Author:

{'name': 'Esha Maharishi', 'email': 'esha.maharishi@mongodb.com', 'username': 'EshaMaharishi'}

Message: SERVER-36853 Coordinator should resume coordinating commit for unfinished transactions on stepup
Branch: master
https://github.com/mongodb/mongo/commit/f948fbcf86d104d70889d7d6a1caa83b4d78a6a8

Comment by Kaloian Manassiev [ 29/Nov/18 ]

The proposed implementation looks good, just a couple of questions:

For 1.1 above - this all will happen in the onDrainComplete phase, right? I believe creating a new coordinator doesn't take any locks, can you remind me how issuing a new 'coordinateCommit' request against a primary will serialize with that recovery process so that in the end the transaction coordinator on the catalog reflects what's on disk, because I think in this case replication will accept PrimaryOnly requests from what I remember? For the transaction participant, this is ensured by the checkoutSession mechanism.

For 1.2 above - this will happen when the drain mode is completed, right? Remind me why do we need to join and not just leave it to the commit pipeline to run in the background.

Generated at Thu Feb 08 04:44:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.