[SERVER-77591] Properly shut down executors in these files owned by Sharding NYC Created: 30/May/23  Updated: 27/Oct/23  Resolved: 01/Jun/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Blake Oler Assignee: Brett Nawrocki
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Assigned Teams:
Sharding NYC
Operating System: ALL
Backport Requested:
v7.0
Participants:
Linked BF Score: 113

 Description   

In the following files, there exist executors that may not be properly shutdown by calling both join and shutdown.

  • client/server_discovery_monitor.cpp
    • 560,11: result->startup();
  • db/s/global_index/global_index_cloning_service.cpp
    • 113,28:     _execForCancelableOpCtx->startup();
  • db/s/move_primary/move_primary_donor_service.cpp
    • 222,24:     _markKilledExecutor->startup();
  • db/s/move_primary/move_primary_recipient_service.cpp
    • 219,24:     _markKilledExecutor->startup();
  • db/s/resharding/resharding_coordinator_service.cpp
    • 1624,24:     _markKilledExecutor->startup();
  • db/s/resharding/resharding_donor_service.cpp
    • 456,24:     _markKilledExecutor->startup();
  • db/s/resharding/resharding_recipient_service.cpp
    • 474,24:     _markKilledExecutor->startup();

To reference the full list of files/executors combed, please refer to the linked BF ticket.



 Comments   
Comment by Brett Nawrocki [ 01/Jun/23 ]

The following executors are all ThreadPools held as shared_ptrs by PrimaryOnlyService instances. These should be shut down correctly for the reasons described in this comment.

db/s/global_index/global_index_cloning_service.cpp
db/s/move_primary/move_primary_donor_service.cpp
db/s/move_primary/move_primary_recipient_service.cpp
db/s/resharding/resharding_coordinator_service.cpp
db/s/resharding/resharding_donor_service.cpp
db/s/resharding/resharding_recipient_service.cpp

The remaining client/server_discovery_monitor.cpp is a unique case, but should also be shut down correctly. The ThreadPoolTaskExecutor created here is used to initialize _executor, a shared_ptr member. ThreadPoolTaskExecutor should shut itself down properly in its destructor, calling both shutdown and join. This executor will be used to initialize multiple child SingleServerDiscoveryMonitor instances, however these are stored as shared pointers only in a map, which will be destroyed when the ServerDiscoveryMonitor's destructor is called.

The ServerDiscoveryMonitor itself is held by StreamableReplicaSetMonitor, which in turn is created by ReplicaSetMonitorManager, which stores these instances in a map. The ReplicaSetMonitorManager is a decoration on the global ServiceContext.

ReplicaSetMonitorManager is eventually shut down via the ReplicaSetMonitor::shutdown method, which is called as part of process shutdown in mongod and mongos. When shut down, the map containing the child monitors (i.e. ultimately the ServerDiscoveryMonitors) is cleared.

Generated at Thu Feb 08 06:36:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.