[SERVER-60444] ThreadPool::waitForIdle's header comment should note that it can't be called concurrently with shutdown Created: 04/Oct/21  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: George Wangensteen Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 0
Labels: sa-remove-fv-backlog-22
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-60782 ThreadPool::waitForIdle should not be... Closed
Assigned Teams:
Service Arch
Operating System: ALL
Sprint: Service Arch 2021-10-18
Participants:
Story Points: 2

 Description   

In SERVER-53477, we made the behavior of ThreadPool::waitForIdle match its documentation by making it safe to call before shutdown. To do so, we have waitForIdle return when the ThreadPool is shutting down, but with the caveat that there is no guarantee that work does or does not remain pending on the ThreadPool when waitForIdle returns due to shutdown. 

 -- 

When implementing that ticket, I expected that the idle-indicating condition variable would necessarily be notified by the ordinary mechanism  here during shutdown as all worker threads drain tasks from the pending work queue before stopping work here. However, because worker threads do not count themselves as idle when stopping due to shutdown, and therefore decrement ThreadPool::numIdleThreads before stopping, that condition may never obtain. The thread that first sees pendingTasks as empty after completing its task and re-aquiring the lock in ThreadPool::doOneTask can check this condition _after other worker threads have stopped and therefore decremented ThreadPool::_numIdleThreads, and therefore will not see _threads.size() == _numIdleThreads. More importantly, 

 -- 

We need to notify poolIsIdle separately when the ThreadPool enters state joinRequired to ensure that people who called waitForIdle _before shutdown are woken up on shutdown, regardless of the current status of the pending work queue or the threads that may be helping to drain it. 

 

update: after discussing with lingzhi.deng and schwerin  we realized providing waitForIdle with a safe-shutdown contract would have required more work and changes to the ThreadPool internals that were somewhat high-risk and unnecessary given the very few callers of waitForIdle. We've decided instead that the new caller of waitForIdle in the TenantOplogApplier should obey the same shutdown-pattern as the existing caller OplogApplier (see SERVER-60782). We'll now use this ticket to change the header comment for waitForIdle to indicate the safe usage of it requires that it is not called concurrently with shut-down or on an already shut-down pool 



 Comments   
Comment by George Wangensteen [ 18/Oct/21 ]

We should also revert the small change in SERVER-53477 as part of this.

Generated at Thu Feb 08 05:49:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.