[SERVER-35091] Profiling in transactions should only be interrupted due to lock acquisition failure Created: 18/May/18 Updated: 06/Dec/22 Resolved: 30/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Tess Avitabile (Inactive) | Assignee: | Backlog - Replication Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||
| Backport Requested: |
v4.0
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
|
| Comments |
| Comment by Spencer Brody (Inactive) [ 30/May/18 ] |
|
Closing this in favor of SERVER-35298 |
| Comment by Dianna Hohensee (Inactive) [ 21/May/18 ] |
|
So it looks like the kill flag has migrated from CurOp to OperationContext::_killCode in recent times. The ServiceContext has a killOperation function that calls OperationContext::markKilled. The only complication with temporarily unsetting the _killCode field is that it looks like killOperation/markKilled is called asynchronous and can be called multiple times. I would therefore suggest adding a flag to OperationContext such that we can skip checking the _killCode here in OperationContext::checkForInterruptNoAssert. _killCode is only used for set, get and check for interrupt. We will leave set and get to behave normally, so it remains correct, and just temporarily put checking it on hold. I believe this conforms with the idea of suspending kill/maxTimeMS on the OperationContext. With that addition, I think the solution we wanted forĀ |
| Comment by Eric Milkie [ 18/May/18 ] |
|
I investigated more carefully and it doesn't clear like I thought it did. However, we might consider actually clearing the kill flag in CurOp::completeAndLogOperation(), since it would make sense that by the time this function has been called, we've already serviced the kill request. Alternatively, we could destroy the CurOp object after completeAndLogOperation is called (since we've already stopped timing the operation), and then adjust code to handle a Client with no active CurOp attached. |
| Comment by Tess Avitabile (Inactive) [ 18/May/18 ] |
|
Profiling does succeed for ops that were interrupted or timed out, but my understanding was that this is due to the UninterruptibleLockGuard. Do you know where we might be clearing the killed and maxTimeMS flags? |
| Comment by Eric Milkie [ 18/May/18 ] |
|
I thought with the way we implemented profiling, the killed and maxTimeMs flags would be cleared when it ran, because it's separate. So I would expect profiling to succeed for ops that were interrupted or timed out? |