[SERVER-41883] UnConditional step down should not set a maxLockTimeout while unstashing the lock resources as a part of yieldLocksForPreparedTransactions(). Created: 24/Jun/19  Updated: 29/Oct/23  Resolved: 15/Jul/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.3.1

Type: Bug Priority: Major - P3
Reporter: Suganthi Mani Assignee: Suganthi Mani
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-41556 Must handle failure to reacquire lock... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Sprint: Repl 2019-07-01, Repl 2019-07-15, Repl 2019-07-29
Participants:

 Description   

Currently, as part of yieldLocksForPreparedTransactions()], step down unstashes the prepared transaction's lock resources from transactionParticipant to its opCtx with maxLockTimeout set to non-zero value (by default it is 5ms). This means we reacquire the ticket with a maxLockTimeout set. This can fail  and if it its an unconditional step down (step down via hb/ force reconfig), it can lead to server crash.

 

Extra Notes:

Below is the scenario where we can run out of tickets.
Assume MaxTicketsAvailable=10
1. 10 prepared txns have acquired the ticket while unstashing the txn resource.
2. 11th Prepared txn is waiting to acquire the ticket.
3. Step down marks canAcceptNonLocalWrite flag to false with RSTL lock in X & repl mutex lock held.
4. YiledLocksForPreparedTxn - Scans the catalog session and marks the 1st prepared txn as killed.
5. Let’s assume, 1st prepared txn checked in the session (released the ticket) because it found out it got killed while acquiring RSTL lock in IX mode. As a result one ticket is now available to acquire.
6. 11th session acquires the ticket. Now, no more tickets are available to be assigned.
7. YieldLocksForPreparedTxn - Session catalog scan marks the 11th transaction to be killed.
8. YieldLocksForPreparedTxn - Now, step down attempts to checkin the session and unstash the transaction. So, step-down should wait for a ticket to be available and it can timeout.



 Comments   
Comment by Githook User [ 25/Jul/19 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-41881 Unstashing the transaction lock resources should ignore the saved value of maxLockTimeout and explicitly set the maxLockTimeout based on node's state.
SERVER-41883 Replication state transition reacquires locks and tickets of a prepared transaction with no lock timeout.
SERVER-41556 Handles failure to reacquire locks and ticket when unstashing transactions.

(cherry picked from commit 2ff54098b19ebc2b4bbf5516de6e6befb46f9fe7)
Branch: v4.2
https://github.com/mongodb/mongo/commit/7be60542714d7cd17a6ecf5e43a47bb7aef9a0d5

Comment by Githook User [ 25/Jul/19 ]

Author:

{'name': 'Suganthi Mani', 'username': 'smani87', 'email': 'suganthi.mani@mongodb.com'}

Message: SERVER-41881 Unstashing the transaction lock resources should ignore the saved value of maxLockTimeout and explicitly set the maxLockTimeout based on node's state.
SERVER-41883 Replication state transition reacquires locks and tickets of a prepared transaction with no lock timeout.
SERVER-41556 Handles failure to reacquire locks and ticket when unstashing transactions.
Branch: master
https://github.com/mongodb/mongo/commit/2ff54098b19ebc2b4bbf5516de6e6befb46f9fe7

Comment by Suganthi Mani [ 24/Jul/19 ]

Author:

{'name': 'Ian Boros', 'email': 'puppyofkosh@gmail.com', 'username': 'puppyofkosh'}

Message: Revert "SERVER-41881 Unstashing the transaction lock resources should ignore the saved value of maxLockTimeout and explicitly set the maxLockTimeout based on node's state."

This reverts commit e707fd09ef0dadbb33510249732fd38c654da914.
Branch: v4.2
https://github.com/mongodb/mongo/commit/65a1db06e9a88e7d96e1359662f5480f939c0e5b

Comment by Suganthi Mani [ 24/Jul/19 ]

Author:

{'name': 'Ian Boros', 'email': 'puppyofkosh@gmail.com', 'username': 'puppyofkosh'}

Message: Revert "SERVER-41881 Unstashing the transaction lock resources should ignore the saved value of maxLockTimeout and explicitly set the maxLockTimeout based on node's state."

This reverts commit b7cec5064fb03f1e1f9bd39af35e495facfdcdc9.
Branch: master
https://github.com/mongodb/mongo/commit/38e92ef5b9fe645cd73fec3742c0fde9caea0cb2

Comment by Githook User [ 15/Jul/19 ]

Author:

{'name': 'Suganthi Mani', 'username': 'smani87', 'email': 'suganthi.mani@mongodb.com'}

Message: SERVER-41881 Unstashing the transaction lock resources should ignore the saved value of maxLockTimeout and explicitly set the maxLockTimeout based on node's state.
SERVER-41883 Replication state transition reacquires locks and tickets of a prepared transaction with no lock timeout.
SERVER-41556 Handles failure to reacquire locks and ticket when unstashing transactions.

(cherry picked from commit b7cec5064fb03f1e1f9bd39af35e495facfdcdc9)
Branch: v4.2
https://github.com/mongodb/mongo/commit/e707fd09ef0dadbb33510249732fd38c654da914

Comment by Githook User [ 15/Jul/19 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-41881 Unstashing the transaction lock resources should ignore the saved value of maxLockTimeout and explicitly set the maxLockTimeout based on node's state.
SERVER-41883 Replication state transition reacquires locks and tickets of a prepared transaction with no lock timeout.
SERVER-41556 Handles failure to reacquire locks and ticket when unstashing transactions.
Branch: master
https://github.com/mongodb/mongo/commit/b7cec5064fb03f1e1f9bd39af35e495facfdcdc9

Generated at Thu Feb 08 04:58:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.