[SERVER-41556] Must handle failure to reacquire locks and ticket when unstashing transaction Created: 05/Jun/19 Updated: 29/Oct/23 Resolved: 15/Jul/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.2.0, 4.3.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Matthew Russotto | Assignee: | Suganthi Mani |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.2
|
||||||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2019-07-01, Repl 2019-07-15, Repl 2019-07-29 | ||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 17 | ||||||||||||||||||||||||||||||||||||||||
| Description |
|
In TxnParticipant::TxnResources::release(OperationContext*), it is possible for either restoring _locker->restoreWriteUnitOfWorkAndLock() or _locker->reacquireTicket(opCtx) to fail. If _locker->reacquireTicket() fails, the locker may be in an inconsistent state (holding locks but not the ticket). Further, because of the swap() we do in TransactionParticipant::Participant::_releaseTransactionResourcesToOpCtx, if either one happens we lose the TxnResources object entirely. If the transaction was prepared, it is now in a prepared state without a stash, which result in a crash next time it is used. If it was not, it's now effectively aborted though not marked as so. For prepared transactions we need to ensure a failed release() leaves the transaction as-is. This will work for other operations as well, but usually we abort in the case of transaction errors so we may want to force an abort in that case as well. |
| Comments |
| Comment by Githook User [ 25/Jul/19 ] |
|
Author: {'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}Message: (cherry picked from commit 2ff54098b19ebc2b4bbf5516de6e6befb46f9fe7) |
| Comment by Githook User [ 25/Jul/19 ] |
|
Author: {'name': 'Suganthi Mani', 'username': 'smani87', 'email': 'suganthi.mani@mongodb.com'}Message: |
| Comment by Suganthi Mani [ 24/Jul/19 ] |
|
Author: {'name': 'Ian Boros', 'email': 'puppyofkosh@gmail.com', 'username': 'puppyofkosh'}Message: Revert " This reverts commit e707fd09ef0dadbb33510249732fd38c654da914. |
| Comment by Suganthi Mani [ 24/Jul/19 ] |
|
Author: {'name': 'Ian Boros', 'email': 'puppyofkosh@gmail.com', 'username': 'puppyofkosh'}Message: Revert " This reverts commit b7cec5064fb03f1e1f9bd39af35e495facfdcdc9. |
| Comment by Githook User [ 15/Jul/19 ] |
|
Author: {'name': 'Suganthi Mani', 'username': 'smani87', 'email': 'suganthi.mani@mongodb.com'}Message: (cherry picked from commit b7cec5064fb03f1e1f9bd39af35e495facfdcdc9) |
| Comment by Githook User [ 15/Jul/19 ] |
|
Author: {'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}Message: |
| Comment by Judah Schvimer [ 06/Jun/19 ] |
|
Since this is a 4.2 bug, we should do this next iteration. |
| Comment by Matthew Russotto [ 06/Jun/19 ] |
|
The relevant code for 4.0 is and Since we don't restore yielded locks and don't use a temporary to hold the TxnResources, this bug isn't in 4.0. |
| Comment by Judah Schvimer [ 05/Jun/19 ] |
|
matthew.russotto, can this happen on 4.0 as well? |