[SERVER-42772] race between TransactionCoordinatorService::joinPreviousRound and coordinator destruction can trigger invariant Created: 12/Aug/19  Updated: 29/Oct/23  Resolved: 21/Aug/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.3.1, 4.2.7

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Matthew Saltz (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Sprint: Sharding 2019-08-26
Participants:
Linked BF Score: 12

 Description   

Example scenario (from cpp test):

The coordinator gets stored in the function object during commit, promises gets fulfilled, the main test gets unblocked here, coordinator gets removed from catalog and when the CoordinatorService tries to join, it will see that there are no coordinator in the catalog, and then it will still trigger the _quiesced invariant because one of the txnCoordinator shared_ptr is still alive in the function object call back inside the PromiseFuture.

Probably very unlikely to happen in an actual mongod process since it takes a long time before a node would transition to primary after becoming a secondary.



 Comments   
Comment by Githook User [ 22/Apr/20 ]

Author:

{'name': 'Matthew Saltz', 'email': 'matthew.saltz@mongodb.com', 'username': 'saltzm'}

Message: SERVER-42772 Return transaction commit decision from onCompletion future in TransactionCoordinator

(cherry picked from commit 05a99879a6369d6220356d099ac5625d359ffd4d)
Branch: v4.2
https://github.com/mongodb/mongo/commit/1bfea45967885a7f1d4198a1a60cbbc9d75ebf18

Comment by Githook User [ 21/Aug/19 ]

Author:

{'username': 'saltzm', 'email': 'matthew.saltz@mongodb.com', 'name': 'Matthew Saltz'}

Message: SERVER-42772 Return transaction commit decision from onCompletion future in TransactionCoordinator
Branch: master
https://github.com/mongodb/mongo/commit/05a99879a6369d6220356d099ac5625d359ffd4d

Comment by Esha Maharishi (Inactive) [ 13/Aug/19 ]

kaloian.manassiev Hmm. In that case, the TransactionCoordinator should set an error on the completion promise, I think. There is no reason for a TransactionCoordinator to complete with neither a decision nor error.

Comment by Kaloian Manassiev [ 13/Aug/19 ]

SGTM as well, but I just can't remember why we had the completion and the decision as separate methods. I have suspicion that this was because you can have a TC complete, but not produce decision (such as for example on step down or early cancelation).

Comment by Randolph Tan [ 13/Aug/19 ]

esha.maharishi the plan sounds good to me.

Comment by Esha Maharishi (Inactive) [ 13/Aug/19 ]

Hmm. Maybe we should just make the TransactionCoordinator's completion promises Future<txn::CommitDecision> rather than Future<void>, so that TransactionCoordinatorService::coordinateCommit does not need to capture the coordinator in a lambda. renctan, kaloian.manassiev, what do you think?

Generated at Thu Feb 08 05:01:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.