[SERVER-79758] Always release `opCtx` before cleaning up session Created: 04/Aug/23  Updated: 23/Aug/23  Resolved: 23/Aug/23

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Amirsaman Memaripour Assignee: Jason Chan
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-80001 Delist opCtx during ClientDisconnect ... Closed
Assigned Teams:
Service Arch
Operating System: ALL
Sprint: Service Arch 2023-08-21, Service Arch 2023-09-04
Participants:
Linked BF Score: 20

 Description   

An operation-fatal error that kills the opCtx (e.g., a connection failure) while running an exhaust operation can result in creating a new opCtx to cleanup exhaust resources, while the old one is still attached to the client. This triggers a tassert and terminates the connection thread before it can clean up exhaust resources. Consider the following:

  • handleRequest notices that the connections is closed, so kills the operation and returns a ConnectionError.
  • This translates to calling _onLoopError from here, while the original opCtx associated with WorkItem is still in scope.
  • _onLoopError calls into _cleanupSession, which tries to delist the operation but is not successful in this case since it's already killed due to the connection error.
  • Next, the thread will call into _cleanupExhaustResources, which attempts to make a new opCtx in order to run killExhaust.

A possible fix is to always destroy the opCtx as part of running _cleanupSession.


Generated at Thu Feb 08 06:41:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.