[SERVER-70642] Avoid acquiring tickets as part of step-up Created: 17/Oct/22 Updated: 29/Oct/23 Resolved: 10/Feb/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 7.0.0-rc0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Wenbin Zhu | Assignee: | Sean Zimmerman |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | repl-shortlist | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Replication
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Backport Requested: |
v6.3, v6.0, v5.0, v4.4
|
||||||||
| Sprint: | Repl 2023-02-06, Repl 2023-02-20 | ||||||||
| Participants: | |||||||||
| Description |
|
As part of step-up, we sometimes need to acquire tickets, for example when aborting in-progress transactions, we do a query on the transaction table which results in acquiring a ticket for global lock. However step-up shouldn't need to acquire tickets, we should audit the step-up (maybe step-down as well) code and remove (bypass) any unnecessary ticket acquisition. |
| Comments |
| Comment by Githook User [ 10/Feb/23 ] |
|
Author: {'name': 'seanzimm', 'email': 'sean.zimmerman@mongodb.com', 'username': 'seanzimm'}Message: |
| Comment by Wenbin Zhu [ 09/Feb/23 ] |
|
Yep this sounds good to me. |
| Comment by Lingzhi Deng [ 09/Feb/23 ] |
|
Thanks louis.williams@mongodb.com. That makes sense and it aligns with my understanding. How does this sound to you wenbin.zhu@mongodb.com? |
| Comment by Louis Williams [ 09/Feb/23 ] |
|
Skipping ticket acquisition to acquire locks for these transactions makes sense. Tickets are really designed to be held by operations that are actively doing work. If an operation is not active, is blocked on something else, or idle (i.e. stashed), then we risk deadlocks by holding tickets. I believe that skipping tickets here is the correct decision. It is also important that we re-acquire a ticket for future operations on any unprepared transactions. But for prepared transactions, we need to ensure that commit and abort succeed and do not block, so it makes sense to not reacquire a ticket. Let me know if this makes sense. |
| Comment by Lingzhi Deng [ 08/Feb/23 ] |
|
wenbin.zhu@mongodb.com, I agree. So I think we should explore what that implication is and whether it is safe to do so. But from my glance earlier today, I think this should be safe because we will release the ticket immediately in step 2 of my previous comment. So it seems to me we acquire the ticket just to release it after. My understanding of flowControl/the ticket system is that it is used to prevent congestion in the WT layer. Refreshing locks for prepared transactions isn't really a storage operation so I don't think it has to acquire tickets. And we also have But yes, I am curious what louis.williams@mongodb.com thinks. |
| Comment by Wenbin Zhu [ 08/Feb/23 ] |
This kind of raises a concern: the ticket was previously filed because we thought we attempted to acquire tickets for some auxiliary work during state transitions. However now it sees that we want to stop acquiring tickets for transactions that are being unstashed, which has very different safety consideration than the original. I think we may need to double check this with the execution team before deciding if we want to go with this. |
| Comment by Sean Zimmerman [ 08/Feb/23 ] |
|
After digging into it more it looks like it will not work as is. The TxnResources has its own locker which needs to have priority set on it. Even without the stashLocker this wouldn't work as the priority of the opCtx isn't used at all here. I'm going to make a change to fix this |
| Comment by Lingzhi Deng [ 08/Feb/23 ] |
|
I don't think it is inaccurate. But the comment was from |
| Comment by Sean Zimmerman [ 08/Feb/23 ] |
|
Do we think this comment is inaccurate? It seems to imply that ticket acquisition shouldn't fail if we are in stepup / stepdown so we shouldn't need to use AcquireTicket::kSkip |
| Comment by Lingzhi Deng [ 08/Feb/23 ] |
|
I want to correct one thing from the description – This ticket was filed because of a HELP ticket in 4.4.15. The stack trace from HELP-38172 involved MongoDSessionCatalog::onStepUp() and DBDirectClient. We initially thought this was abortInProgressTransactions. But as sean.zimmerman@mongodb.com found, the opCtx used MongoDSessionCatalog::onStepUp already opted out from flowControl. So in 4.4, the DBDirectClient called was probably from checking out the session. However, as part of stepUp, we also reacquire locks for prepared transactions. As part of that, we do two things:
For this, I think we should explore using AcquireTicket::kSkip to refresh lock on stepUp. But I am not entirely sure if opting out of ticket acquisition for the newOpCtx would also work here as _releaseTransactionResourcesToOpCtx uses a stash locker. CC louis.williams@mongodb.com wenbin.zhu@mongodb.com sean.zimmerman@mongodb.com |
| Comment by Louis Williams [ 18/Oct/22 ] |
|
This is an example of the correct way to opt-out of ticket acquisition, using AdmissionPriority::kImmediate. We should exempt the step-up thread from taking tickets in this way. If individual step-up tasks are run on different OperationContexts, we may need to do something to ensure all of these sub-tasks also opt-out of tickets. |