[SERVER-41317] Push commitTransaction's check for a majority-committed prepare down into the TransactionParticipant Created: 24/May/19  Updated: 29/Oct/23  Resolved: 29/May/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.13

Type: Bug Priority: Major - P3
Reporter: Vesselina Ratcheva (Inactive) Assignee: Vesselina Ratcheva (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-41355 Step down should call yieldLocksForPr... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2019-06-03
Participants:
Linked BF Score: 20

 Description   

SERVER-40269 introduced a check in the execution of the commitTransaction command that would cause the server to usassert if the commit was for a prepare that has not yet been majority committed. For that purpose, the command needs to acquire the replication coordinator mutex while it has its session checked out, so it needs to be interruptible. Otherwise, it can deadlock with stepdown, which holds the replication coordinator mutex while trying to check out sessions to make prepared transactions yield their locks.

We can avoid this deadlock if we move the check into TransactionParticipant::commitPreparedTransaction, soon after the RSTL lock acquisition. Stepdown would already be holding the RSTL, and we would be interruptible while waiting to acquire it.



 Comments   
Comment by Githook User [ 29/May/19 ]

Author:

{'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}

Message: SERVER-41317 Push commit transaction's check for a majority-committed prepare down into the TransactionParticipant
Branch: master
https://github.com/mongodb/mongo/commit/cb45824b458c1b3714a379c5c658e1e89238c03d

Comment by Esha Maharishi (Inactive) [ 29/May/19 ]

Per offline discussion to address my above comment: the assertion did not actually fail, it's just that the check in the assert statement caused the deadlock.

Comment by Judah Schvimer [ 28/May/19 ]

This assertion should never be hit. It was added as a safeguard against potential bugs in the coordinator, or against bugs in the rollback fuzzer which mimics the coordinator in certain respects.

Comment by Esha Maharishi (Inactive) [ 28/May/19 ]

Hmm. How can a participant receive commit before the prepare has been majority committed?

I can only imagine it happening if the participant replica set is split brained and the newer primary's half has majority committed the prepare. And that would additionally require the coordinator to have failed over, because a single coordinator mongod should not target an older remote primary (for commit) after targeting a newer remote primary (for prepare)...

If something like this was not the case, I'm concerned why the coordinator sent commit before hearing that prepare had been majority-committed.

Generated at Thu Feb 08 04:57:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.