[SERVER-79202] PinnedConnectionTaskExecutor can hang when shutting down Created: 21/Jul/23  Updated: 30/Jan/24  Resolved: 27/Sep/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc4, 7.0.6, 6.0.14

Type: Bug Priority: Major - P3
Reporter: George Wangensteen Assignee: George Wangensteen
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
Assigned Teams:
Service Arch
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.0, v6.0
Sprint: Service Arch 2023-08-21, Service Arch 2023-09-04, Service Arch 2023-09-18, Service Arch 2023-10-02
Participants:
Linked BF Score: 5

 Description   

PinnedConnectionTaskExecutor creates a lambda with a shared_ptr to itself here: https://github.com/mongodb/mongo/blob/4ff092d683be418230ef28fa3f3c81833b82c570/src/mongo/executor/pinned_connection_task_executor.cpp#L323 that can extend the PinnedConnectionTaskExecutor's lifetime (the lambda is scheduled to run on the ThreadPoolTaskExecutor the PinnedConnectionTaskExecutor proxies over)

If the shared_ptr stored in that lambda is the last reference to the PinnedConnectionTaskExecutor, the PinnedConnectionTaskExecutor will be destroyed upon that lambda's destruction. This can result in the destruction of the PCTE's shared_ptr to the underlying TPTE, which will call join on that TPTE. However, because the lambda is executed on one of the TPTE's own threads, we violate the TaskExecutor::join contract and may hang (we are stuck waiting for our own callback to finish).

This should only be a problem in tests, because the only production-use of PinnedConnectionTaskExecutor is TaskExecutorCursor, which uses the mongotExecutor as the underlying TPTE. A shared_ptr to that object is stored in a ServiceContext decoration and is only destroyed when the ServiceContext is. Since search is mongod-only, and we don't destroy the serviceContext ever in mongod in production (and we destroy all user operations/clients first), it shouldn't be possible for any PCTE callback to have the last reference to the underlying TPTE.



 Comments   
Comment by Githook User [ 30/Jan/24 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com', 'username': 'gewa24'}

Message: SERVER-79202 Prevent PinnedConnectionExecutor from destroying itself

GitOrigin-RevId: 246d4a2cb18cc10e43add7312da879338a339984
Branch: v6.0
https://github.com/mongodb/mongo/commit/8e69bfd90047a13304fc0b4d23bbd5b1ff61d136

Comment by Githook User [ 23/Jan/24 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com', 'username': 'gewa24'}

Message: SERVER-79202 Prevent PinnedConnectionExecutor from destroying itself

GitOrigin-RevId: 1ec5865e595d435c7983d2b069f210aa973136f5
Branch: v7.0
https://github.com/mongodb/mongo/commit/5929b17ca2a13d9b4c07e719c914638ba1d8677e

Comment by Githook User [ 27/Sep/23 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com', 'username': 'gewa24'}

Message: SERVER-79202 Prevent PinnedConnectionExecutor from destroying itself
Branch: master
https://github.com/mongodb/mongo/commit/8d46e0997dac4dd9354a5c2a346ddda3e2d428fb

Generated at Thu Feb 08 06:40:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.