[SERVER-41980] Non-transactional commands can deadlock with prepared transactions when the tickets are exhausted by the non-transactional write commands. Created: 27/Jun/19 Updated: 29/Oct/23 Resolved: 25/Jul/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.2.0-rc5, 4.3.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Suganthi Mani | Assignee: | Suganthi Mani |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v4.2
|
||||||||||||||||
| Sprint: | Repl 2019-07-15, Repl 2019-07-29 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Let's assume the number of write tickets available = 1. Consider the below sequence. 1) Transaction gets prepared and waits to commit. Once the prepare succeeds on primary, as a part of stashing the lock resources, we release the ticket but hold the global lock in IX mode. For a cross-shard transactions, the transaction coordinator keeps retrying the commitTransaction cmd until it succeeds. But due to above deadlock, there won't be any progress on the primary. The above deadlock happens on primary because the transaction violates the ordering while unstashing the lock resources where ticket is acquired with the global lock held. Note: The above is a problem only for a prepared txns ( commitTransaction cmd + cross-shard transaction combo) and not for unprepared txns because the transactions gets aborted either by the transaction reaper or by the higher transaction number (see |
| Comments |
| Comment by Githook User [ 25/Jul/19 ] |
|
Author: {'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}Message: (cherry picked from commit be06cfaae8872737fe349a8a400f322123307061) |
| Comment by Githook User [ 25/Jul/19 ] |
|
Author: {'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}Message: |
| Comment by Githook User [ 23/Jul/19 ] |
|
Author: {'name': 'Ian Boros', 'email': 'puppyofkosh@gmail.com', 'username': 'puppyofkosh'}Message: Revert " This reverts commit aa4089f9d3abccdf4724c6c49a8bde504359b800. |
| Comment by Githook User [ 23/Jul/19 ] |
|
Author: {'name': 'Ian Boros', 'email': 'puppyofkosh@gmail.com', 'username': 'puppyofkosh'}Message: Revert " This reverts commit a5d4ab967af9cbba17e6aa5afadca35927bd74c1. |
| Comment by Githook User [ 22/Jul/19 ] |
|
Author: {'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}Message: (cherry picked from commit a5d4ab967af9cbba17e6aa5afadca35927bd74c1) |
| Comment by Githook User [ 22/Jul/19 ] |
|
Author: {'name': 'Suganthi Mani', 'username': 'smani87', 'email': 'suganthi.mani@mongodb.com'}Message: |
| Comment by Suganthi Mani [ 09/Jul/19 ] |
|
Quick note, this problem applies even for abortTransaction cmd + cross-shard transaction combo as the txn coordinator keeps retrying the abortTransaction cmd. |
| Comment by Suganthi Mani [ 08/Jul/19 ] |
|
Spoke with geert.bosch and we agreed on a solution where a commitTransaction cmd should not reacquire the ticket while unstashing the lock resources (like we do it in ftdc). |