[SERVER-67618] Stop using ErrorCategory::Interruption in Sharding codebase Created: 28/Jun/22  Updated: 29/Oct/23  Resolved: 22/Aug/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Matt Diener (Inactive) Assignee: Abdul Qadeer
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-56251 Alleviate problems that arise when Op... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2022-08-22
Participants:
Story Points: 2

 Description   

Context

`ErrorCategory::Interruption` has had its definition change over time. When investigating SERVER-56251, we determined that some parts of our codebase use `ErrorCategory::Interruption` as an indicator that the opCtx was killed, while other pieces of code simply use it to indicate that something was interrupted in a specific API.

The solution to bugs over time has been to expand this error category to include more errors, which has eroded the utility of this category.

Pieces of code which catch this error category as an indication that the opCtx was killed are unfortunately incorrect because:

  1. An Interruption error can be observed without a killed opCtx (these errors are generic).
  2. A non-`Interruption` error can be observed when the opCtx is killed. It never made that guarantee, and investigations are showing it cannot make that guarantee without even further expanding the `Interruption` category.
  3. Even if the opCtx always threw an `Interruption`-category error, there is nothing stopping some other in-between layer from catching that exception and throwing something else.

Pieces of code which catch this error category as an indication of their API being interrupted are possibly incorrect because the definition of this error category has changed over time.

The end goal of this and related bugs is to eliminate all uses of `ErrorCategory::Interruption` and then remove the category altogether.

 

Acceptance criteria

Remove references to `ErrorCategory::Interruption` in the files specified below. Understand the intention of the existing usage and use your judgement to re-implement that logic in a more robust way.

 

Files in question

It's possible not all of these are owned by your team. Please reach out to matt.diener@mongodb.com if we should re-assign a subset of this work elsewhere.

# src/mongo/db/s/*
1) sharding_catalog_manager.cpp, *Investigate* ShardingCatalogManager::withTransaction
 
2) resharding_oplog_fetcher.cpp, ReshardingOplogFetcher::iterate 

Solution(s)

Here are some potential fixes:

1) If we are catching the exception and assuming the opCtx is cancelled, we should catch ALL exceptions and check the opCtx directly:

catch (DBException& e) {
    ...
    if (!opCtx->getKillStatus().OK()) {
        // We now know the opCtx is actually killed.
        // We do not know whether this exception was raised by opCtx.
    }
    throw; // if appropriate
}

 

2) If we are using the `Interruption` category for something that has nothing to do with the opCtx, create a new category in `error_codes.yml` that is tied to the component using the category. Be deliberate about exactly which errors belong to that category. The expansion of the `Interruption` category caused that component's behavior to be altered slightly.

 

3) If we are encountering an assert that an error we have fits this category, investigate what the error category was meant to indicate and find another thing to assert on, or use a new error category if necessary.

 

4) If none of the above apply, use your best judgement, consider reaching out to matt.diener@mongodb.com to discuss ways this can be resolved.



 Comments   
Comment by Githook User [ 22/Aug/22 ]

Author:

{'name': 'Abdul Qadeer', 'email': 'abdul.qadeer@mongodb.com', 'username': 'zorro786'}

Message: SERVER-67618 Remove usage of ErrorCategory::Interruption in Sharding
Branch: master
https://github.com/mongodb/mongo/commit/ba75ef94152996b7b61b2da9f22a772cb160d731

Generated at Thu Feb 08 06:08:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.