[SERVER-40633] Audit all uses of _replExecutor and check shutdown in scheduled tasks Created: 12/Apr/19  Updated: 06/Dec/22  Resolved: 23/Apr/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Backlog - Service Architecture
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-40499 ReplicationCoordinatorImpl::_handleHe... Closed
Gantt Dependency
has to be done after SERVER-40769 Untrack heartbeat callbacks on heartb... Closed
Related
is related to SERVER-40795 Always execute ThreadPoolTaskExecutor... Closed
is related to SERVER-39965 Make OutOfLineExecutor return a Statu... Closed
Assigned Teams:
Service Arch
Operating System: ALL
Participants:
Linked BF Score: 5

 Description   

After SERVER-39965, a scheduled task may run on the caller's thread rather than in the thread pool on shutdown. Since the scheduled task can acquire the replication mutex which may have been held by the caller, a deadlock will occur. Heartbeat schedule is an example.

Due to how heartbeat callbacks is tracked in _heartbeatHandles, it may not be straightforward to check the status and return before acquiring the lock.

Heartbeat cancellation has also to be considered when working on this ticket. When the heartbeat is cancelled while it's already running or scheduled but hasn't run yet, the ownership of the callback should be clarified. Currently, it's the cancelled callback's job to clear itself.



 Comments   
Comment by Mira Carey [ 23/Apr/19 ]

Rather than going this route, we're going to fix up the thread pool task executor shutdown so that races in shutdown never result in calling methods inline

See SERVER-40795

Comment by Benjamin Caimano (Inactive) [ 15/Apr/19 ]

Hey judah.schvimer, as of now, post-shutdown scheduling is the only situation where a task will be run directly on the caller's thread. (See here)

That said, please take your cue from the Status that you are passed. If the task is run via expected out of line execution, it will have an OK status. Otherwise, you have all the dangers one would assume in shutdown. You probably shouldn't make new clients or the like.

(The goal is for all cancellation to have CallbackCanceled as the status and all joining and post-shutdown scheduling to receive ShutdownInProgress. Sadly, we haven't gotten sharding all the way on board for this. )

Comment by Judah Schvimer [ 15/Apr/19 ]

a scheduled task may run on the caller's thread rather than in the thread pool on shutdown.

siyuan.zhou and ben.caimano, can you please clarify: is the task only run on the caller's thread during shutdown, or when it is cancelled, or is it more general than that? How do we decide given a status what action to take and what thread the task is run on?

Generated at Thu Feb 08 04:55:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.