[SERVER-85912] Audit code paths explicitly checking for ErrorCodes::Interrupted Created: 30/Jan/24  Updated: 31/Jan/24

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Yujin Kang Park Assignee: Backlog - Catalog and Routing
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-56251 Alleviate problems that arise when Op... Backlog
is related to SERVER-70010 Stop using getKillStatus to check for... Closed
Assigned Teams:
Catalog and Routing
Participants:

 Description   

In the context of load shedding operations, a new error code would be used to markKilled an operation. Audit code paths where we are explicitly comparing with Interrupted, or catching an exception for this error code.



 Comments   
Comment by Max Hirschhorn [ 30/Jan/24 ]

The intention of SERVER-56251 and its linked tickets was to remove ErrorCategory::Interruption in favor of checking the OperationContext if it had been interrupted (see SERVER-70010). Note however this work is not fully complete.

$ git grep ErrorCategory::Interruption -- src/mongo/ ':!*test*'
src/mongo/db/periodic_runner_job_abort_expired_transactions.cpp:132:            } catch (ExceptionForCat<ErrorCategory::Interruption>& ex) {
src/mongo/db/repl/noop_writer.cpp:121:            } catch (ExceptionForCat<ErrorCategory::Interruption>& ex) {
src/mongo/db/repl/tenant_migration_donor_service.cpp:172:                                                          ErrorCategory::Interruption> {
src/mongo/db/s/config/configsvr_remove_shard_command.cpp:132:            } catch (const ExceptionForCat<ErrorCategory::Interruption>&) {
src/mongo/db/s/config/configsvr_transition_to_dedicated_config_server_command.cpp:124:            } catch (const ExceptionForCat<ErrorCategory::Interruption>&) {
src/mongo/db/s/ddl_lock_manager.h:128:         *     ErrorCategory::Interruption in case the operation context is interrupted.
src/mongo/db/s/ddl_lock_manager.h:158:         *     ErrorCategory::Interruption in case the operation context is interrupted.
src/mongo/db/s/sharding_ddl_coordinator.cpp:550:        status.isA<ErrorCategory::Interruption>() ||

The killOp command would interrupt an operation with ErrorCodes::Interrupted. If an internal operation can be interrupted with a new error code then we have more places to audit than those which explicitly check for ErrorCodes::Interrupted for how the interruption may need to be handled. For example, is simply retrying a sufficient means of handling this new error?

Generated at Thu Feb 08 06:58:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.