[SERVER-69496] InterruptedAtShutdown can be thrown without the operation context being marked as killed Created: 07/Sep/22 Updated: 27/Oct/23 Resolved: 03/Oct/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Gregory Noma | Assignee: | Matt Diener (Inactive) |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||
| Sprint: | Service Arch 2022-09-19, Service Arch 2022-10-03, Service Arch 2022-10-17 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Linked BF Score: | 151 | ||||||||||||||||||||||||||||||||
| Description |
|
During shutdown, we first set the global kill flag and then mark each operation context as killed. In OperationContext::checkForInterruptNoAssert, we return InterruptedAtShutdown if the global kill flag is set. However, we do not mark the operation context as killed from here. This means that if an operation explicitly checks for interrupts and we get an interleaving where this occurs between the two steps in ServiceContext::setKillAllOperations, that operation will get an InterruptedAtShutdown exception with the operation context not (yet) marked as killed. |
| Comments |
| Comment by Matt Diener (Inactive) [ 03/Oct/22 ] |
|
Instead of this task, |
| Comment by Matt Diener (Inactive) [ 29/Sep/22 ] |
|
Update: sent out 3 PRs for |
| Comment by Matt Diener (Inactive) [ 27/Sep/22 ] |
|
Moving this back to open, but keeping it open until the completion of |
| Comment by Louis Williams [ 21/Sep/22 ] |
|
Does this mean that the current requirement for every catch() that checks isKillPending must also check whether the thrown exception was InterruptedAtShutdown? I feel like this is quite risky for managing exceptions in the server and determining whether or not an operation was interrupted. We already had to handle some test fallout from this requirement (e.g. |
| Comment by Max Hirschhorn [ 20/Sep/22 ] |
matt.diener@mongodb.com, can you clarify about the instructions and possible solutions in the tickets linked to SERVER-56251 (e.g. |
| Comment by Matt Diener (Inactive) [ 20/Sep/22 ] |
|
The design is unfortunately confusing but there is a distinction between Interrupted and Killed in the opCtx. If we are asking whether work associated with an opCtx should stop, we should use `checkForInterruptNoAssert` to get a Status. If we care about whether the opCtx is killed or explicitly want to know the reason it has been killed, use `getKillStatus()`. Usage of both may be required for some use cases, as `checkForInterruptNoAssert` doesn't guarantee it'll return the kill Status. |