[SERVER-60782] ThreadPool::waitForIdle should not be called concurrently with shutdown in TenantOplogApplier shutdown Created: 18/Oct/21  Updated: 29/Oct/23  Resolved: 25/Oct/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.2.0, 5.1.0-rc3

Type: Bug Priority: Major - P3
Reporter: George Wangensteen Assignee: Lingzhi Deng
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-60444 ThreadPool::waitForIdle's header comm... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.1
Sprint: Repl 2021-11-01
Participants:
Linked BF Score: 21

 Description   

waitForIdle was ported from the old ThreadPool when the current one was written to serve the needs of a few highly-constrained callers. The current ThreadPool was not designed to have waitForIdle called concurrently with shutdown/a shutting-down thread pool; to use waitForIdle safely the threadPool must remain in an un-shut-down state from the time waitForIdle is called until the pool idles (and waitForIdle returns).

We considered updating the ThreadPool to give waitForIdle a safe-contract with regards to concurrent shutdown, but it would require non-trivial changes to the ThreadPool bookeeping internals that are somewhat high-risk because of the widespread risk of the ThreadPool. Additionally, waitForIdle has only 3 non-test only users: the TenantOplogApplier, the old OplogApplierImpl, and the DeferredWriter, and we plan to deprecate the current waitForIdle API in favor of a barrier-based approach. So we've decided not to change the ThreadPool internals and instead ensure all callers follow the above safety guarantee for waitForIdle.

Currently, the TenantOplogApplier is the only piece of code that may call waitForIdle on a shutting-down or shut-down thread pool. This is unsafe and may lead to hangs. Instead, it would be better to follow the pattern in the old OplogApplierImpl, which joins any threads that may call waitForIdle before shutting down the thread pool, guaranteeing that waitForIdle cannot be called concurrently with shutdown or on a shut-down thread pool.  This ticket tracks modifying TenantOplogApplier shutdown to use the safe pattern.

(In SERVER-60444 SA will update the comments in the code around waitForIdle to document this safety guarantee; apologies for not doing so earlier and thanks for your help with this!)

 



 Comments   
Comment by Githook User [ 01/Nov/21 ]

Author:

{'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}

Message: SERVER-60782: ThreadPool::waitForIdle should not be called concurrently with shutdown in TenantOplogApplier shutdown

(cherry picked from commit 90a82a5938e5655e283518feb29c92bdb490bb9d)
Branch: v5.1
https://github.com/mongodb/mongo/commit/c4686a8ae4ac666cc10d25484abe003ab21fb835

Comment by Githook User [ 25/Oct/21 ]

Author:

{'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}

Message: SERVER-60782: ThreadPool::waitForIdle should not be called concurrently with shutdown in TenantOplogApplier shutdown
Branch: master
https://github.com/mongodb/mongo/commit/90a82a5938e5655e283518feb29c92bdb490bb9d

Comment by George Wangensteen [ 18/Oct/21 ]

CC lingzhi.deng schwerin 

Generated at Thu Feb 08 05:50:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.