[SERVER-35091] Profiling in transactions should only be interrupted due to lock acquisition failure Created: 18/May/18  Updated: 06/Dec/22  Resolved: 30/May/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Tess Avitabile (Inactive) Assignee: Backlog - Replication Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-35298 Profiling transaction operations shou... Backlog
Assigned Teams:
Replication
Backport Requested:
v4.0
Participants:

 Description   

SERVER-34726 removes the UninterruptibleLockGuard protecting lock acquisitions for profiling of transaction operations. This is done to allow lock acquisitions for profiling of transaction operations to have a maxTransactionLockRequestTimeoutMillis timeout. However, the UninterruptibleLockGuard also prevented checking whether the operation was interrupted or timed out, which is desirable behavior. We would like profiling for transaction operations to succeed if the operation was interrupted or timed out, while preserving the maxTransactionLockRequestTimeoutMillis lock acquisition timeout.



 Comments   
Comment by Spencer Brody (Inactive) [ 30/May/18 ]

Closing this in favor of SERVER-35298

Comment by Dianna Hohensee (Inactive) [ 21/May/18 ]

So it looks like the kill flag has migrated from CurOp to OperationContext::_killCode in recent times. The ServiceContext has a killOperation function that calls OperationContext::markKilled. The only complication with temporarily unsetting the _killCode field is that it looks like killOperation/markKilled is called asynchronous and can be called multiple times. I would therefore suggest adding a flag to OperationContext such that we can skip checking the _killCode here in OperationContext::checkForInterruptNoAssert. _killCode is only used for set, get and check for interrupt. We will leave set and get to behave normally, so it remains correct, and just temporarily put checking it on hold. I believe this conforms with the idea of suspending kill/maxTimeMS on the OperationContext.

With that addition, I think the solution we wanted forĀ SERVER-34726 will work again.

Comment by Eric Milkie [ 18/May/18 ]

I investigated more carefully and it doesn't clear like I thought it did. However, we might consider actually clearing the kill flag in CurOp::completeAndLogOperation(), since it would make sense that by the time this function has been called, we've already serviced the kill request. Alternatively, we could destroy the CurOp object after completeAndLogOperation is called (since we've already stopped timing the operation), and then adjust code to handle a Client with no active CurOp attached.

Comment by Tess Avitabile (Inactive) [ 18/May/18 ]

Profiling does succeed for ops that were interrupted or timed out, but my understanding was that this is due to the UninterruptibleLockGuard. Do you know where we might be clearing the killed and maxTimeMS flags?

Comment by Eric Milkie [ 18/May/18 ]

I thought with the way we implemented profiling, the killed and maxTimeMs flags would be cleared when it ran, because it's separate. So I would expect profiling to succeed for ops that were interrupted or timed out?

Generated at Thu Feb 08 04:38:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.