[SERVER-67611] Stop using ErrorCategory::Interruption in Execution codebase Created: 28/Jun/22 Updated: 29/Oct/23 Resolved: 02/Sep/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.2.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Matt Diener (Inactive) | Assignee: | Gregory Noma |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Sprint: | Execution Team 2022-09-05 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 153 | ||||||||||||||||||||
| Description |
|
Context `ErrorCategory::Interruption` has had its definition change over time. When investigating SERVER-56251, we determined that some parts of our codebase use `ErrorCategory::Interruption` as an indicator that the opCtx was killed, while other pieces of code simply use it to indicate that something was interrupted in a specific API. The solution to bugs over time has been to expand this error category to include more errors, which has eroded the utility of this category. Pieces of code which catch this error category as an indication that the opCtx was killed are unfortunately incorrect because:
Pieces of code which catch this error category as an indication of their API being interrupted are possibly incorrect because the definition of this error category has changed over time. The end goal of this and related bugs is to eliminate all uses of `ErrorCategory::Interruption` and then remove the category altogether.
Acceptance criteria Remove references to `ErrorCategory::Interruption` in the files specified below. Understand the intention of the existing usage and use your judgement to re-implement that logic in a more robust way.
Files in question It's possible not all of these are owned by your team. Please reach out to matt.diener@mongodb.com if we should re-assign a subset of this work elsewhere.
Solution(s) Here are some potential fixes: 1) If we are catching the exception and assuming the opCtx is cancelled, we should catch ALL exceptions and check the opCtx directly:
2) If we are using the `Interruption` category for something that has nothing to do with the opCtx, create a new category in `error_codes.yml` that is tied to the component using the category. Be deliberate about exactly which errors belong to that category. The expansion of the `Interruption` category caused that component's behavior to be altered slightly.
3) If we are encountering an assert that an error we have fits this category, investigate what the error category was meant to indicate and find another thing to assert on, or use a new error category if necessary.
4) If none of the above apply, use your best judgement, consider reaching out to matt.diener@mongodb.com to discuss ways this can be resolved. |
| Comments |
| Comment by Githook User [ 02/Sep/22 ] |
|
Author: {'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}Message: |