waitForIdle was ported from the old ThreadPool when the current one was written to serve the needs of a few highly-constrained callers. The current ThreadPool was not designed to have waitForIdle called concurrently with shutdown/a shutting-down thread pool; to use waitForIdle safely the threadPool must remain in an un-shut-down state from the time waitForIdle is called until the pool idles (and waitForIdle returns).
We considered updating the ThreadPool to give waitForIdle a safe-contract with regards to concurrent shutdown, but it would require non-trivial changes to the ThreadPool bookeeping internals that are somewhat high-risk because of the widespread risk of the ThreadPool. Additionally, waitForIdle has only 3 non-test only users: the TenantOplogApplier, the old OplogApplierImpl, and the DeferredWriter, and we plan to deprecate the current waitForIdle API in favor of a barrier-based approach. So we've decided not to change the ThreadPool internals and instead ensure all callers follow the above safety guarantee for waitForIdle.
Currently, the TenantOplogApplier is the only piece of code that may call waitForIdle on a shutting-down or shut-down thread pool. This is unsafe and may lead to hangs. Instead, it would be better to follow the pattern in the old OplogApplierImpl, which joins any threads that may call waitForIdle before shutting down the thread pool, guaranteeing that waitForIdle cannot be called concurrently with shutdown or on a shut-down thread pool. This ticket tracks modifying TenantOplogApplier shutdown to use the safe pattern.
(In SERVER-60444 SA will update the comments in the code around waitForIdle to document this safety guarantee; apologies for not doing so earlier and thanks for your help with this!)
- related to
-
SERVER-60444 ThreadPool::waitForIdle's header comment should note that it can't be called concurrently with shutdown
- Backlog