[SERVER-41317] Push commitTransaction's check for a majority-committed prepare down into the TransactionParticipant Created: 24/May/19 Updated: 29/Oct/23 Resolved: 29/May/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.13 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Vesselina Ratcheva (Inactive) | Assignee: | Vesselina Ratcheva (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Sprint: | Repl 2019-06-03 | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 20 | ||||||||||||
| Description |
|
We can avoid this deadlock if we move the check into TransactionParticipant::commitPreparedTransaction, soon after the RSTL lock acquisition. Stepdown would already be holding the RSTL, and we would be interruptible while waiting to acquire it. |
| Comments |
| Comment by Githook User [ 29/May/19 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}Message: |
| Comment by Esha Maharishi (Inactive) [ 29/May/19 ] |
|
Per offline discussion to address my above comment: the assertion did not actually fail, it's just that the check in the assert statement caused the deadlock. |
| Comment by Judah Schvimer [ 28/May/19 ] |
|
This assertion should never be hit. It was added as a safeguard against potential bugs in the coordinator, or against bugs in the rollback fuzzer which mimics the coordinator in certain respects. |
| Comment by Esha Maharishi (Inactive) [ 28/May/19 ] |
|
Hmm. How can a participant receive commit before the prepare has been majority committed? I can only imagine it happening if the participant replica set is split brained and the newer primary's half has majority committed the prepare. And that would additionally require the coordinator to have failed over, because a single coordinator mongod should not target an older remote primary (for commit) after targeting a newer remote primary (for prepare)... If something like this was not the case, I'm concerned why the coordinator sent commit before hearing that prepare had been majority-committed. |