[SERVER-41980] Non-transactional commands can deadlock with prepared transactions when the tickets are exhausted by the non-transactional write commands. Created: 27/Jun/19  Updated: 29/Oct/23  Resolved: 25/Jul/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.2.0-rc5, 4.3.1

Type: Bug Priority: Major - P3
Reporter: Suganthi Mani Assignee: Suganthi Mani
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-41556 Must handle failure to reacquire lock... Closed
related to SERVER-42398 abortTransaction and commitTransactio... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Sprint: Repl 2019-07-15, Repl 2019-07-29
Participants:

 Description   

Let's assume the number of write tickets available = 1. Consider the below sequence. 

1) Transaction gets prepared and waits to commit.  Once the prepare succeeds on primary, as a part of stashing the lock resources, we release the ticket but hold the global lock in IX mode.
2)  Now, commands (like create, find, insert) not running in transaction comes in and acquires the ticket and global lock but gets blocked behind the prepared txn on a prepare conflict or DB/collection level lock conflict.
3) Next, commitTransaction cmd comes in and as a part of unstashing the lock resources, the commit cmd will try to reacquire the ticket. But, it fails and gets blocked by the non-transactional ops in step no:2

For a cross-shard transactions, the transaction coordinator keeps retrying the commitTransaction cmd until it succeeds. But due to above deadlock, there won't be any progress on the primary. The above deadlock happens on primary because the transaction violates the ordering while unstashing the lock resources where ticket is acquired with the global lock held.

Note: The above is a problem only for a prepared txns ( commitTransaction cmd + cross-shard transaction combo) and not for unprepared txns because the transactions gets aborted either by the transaction reaper or by the higher transaction number (see SERVER-41976) which would allow step no:2 to proceed.



 Comments   
Comment by Githook User [ 25/Jul/19 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-41980 Prepared transactions should not acquire ticket on primary.

(cherry picked from commit be06cfaae8872737fe349a8a400f322123307061)
Branch: v4.2
https://github.com/mongodb/mongo/commit/24c1ddcbee28eb2e3901a3cbdd03debde8be48c1

Comment by Githook User [ 25/Jul/19 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-41980 Prepared transactions should not acquire ticket on primary.
Branch: master
https://github.com/mongodb/mongo/commit/be06cfaae8872737fe349a8a400f322123307061

Comment by Githook User [ 23/Jul/19 ]

Author:

{'name': 'Ian Boros', 'email': 'puppyofkosh@gmail.com', 'username': 'puppyofkosh'}

Message: Revert "SERVER-41980 Prepared transactions should not acquire ticket on primary."

This reverts commit aa4089f9d3abccdf4724c6c49a8bde504359b800.
Branch: v4.2
https://github.com/mongodb/mongo/commit/3bfb22c6ee0d45b7144b5cbe864fa88afe471215

Comment by Githook User [ 23/Jul/19 ]

Author:

{'name': 'Ian Boros', 'email': 'puppyofkosh@gmail.com', 'username': 'puppyofkosh'}

Message: Revert "SERVER-41980 Prepared transactions should not acquire ticket on primary."

This reverts commit a5d4ab967af9cbba17e6aa5afadca35927bd74c1.
Branch: master
https://github.com/mongodb/mongo/commit/e529848702394e0b030d8d0eb0a61d03950a27a6

Comment by Githook User [ 22/Jul/19 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-41980 Prepared transactions should not acquire ticket on primary.

(cherry picked from commit a5d4ab967af9cbba17e6aa5afadca35927bd74c1)
Branch: v4.2
https://github.com/mongodb/mongo/commit/aa4089f9d3abccdf4724c6c49a8bde504359b800

Comment by Githook User [ 22/Jul/19 ]

Author:

{'name': 'Suganthi Mani', 'username': 'smani87', 'email': 'suganthi.mani@mongodb.com'}

Message: SERVER-41980 Prepared transactions should not acquire ticket on primary.
Branch: master
https://github.com/mongodb/mongo/commit/a5d4ab967af9cbba17e6aa5afadca35927bd74c1

Comment by Suganthi Mani [ 09/Jul/19 ]

Quick note, this problem applies even for abortTransaction cmd + cross-shard transaction combo as the txn coordinator keeps retrying the abortTransaction cmd.

Comment by Suganthi Mani [ 08/Jul/19 ]

Spoke with geert.bosch and we agreed on a solution where a commitTransaction cmd should not reacquire the ticket while unstashing the lock resources (like we do it in ftdc).

Generated at Thu Feb 08 04:59:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.