[SERVER-53410] No need to shutdown writerPool when interrupting recipient service instances Created: 17/Dec/20  Updated: 29/Oct/23  Resolved: 17/Dec/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Bug Priority: Major - P3
Reporter: Lingzhi Deng Assignee: Lingzhi Deng
Resolution: Fixed Votes: 0
Labels: pm-1791_milestone-B
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-53312 Enable recipient testing for tenant_m... Closed
Related
is related to SERVER-53477 ThreadPool::waitForIdle should be int... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2020-12-28
Participants:

 Description   

The header file claims that it is legal to call waitForIdle before shutdown is called. But it is not. On shutdown, the thread pool will drain all pending tasks and shut down all threads, after which the _numIdleThreads will become 0. So if we call shutdown, then waitForIdle would hang because _numIdleThreads (0) would be < the size of _thread (until join is called).

However, in tenant migration, we call _writerPool->shutdown() without join on interrupt and rely on the _tenantOplogApplier to be able interrupt itself based on interrupt errors. And we only join after all components have been interrupted in the last clean up stage. So that means if _tenantOplogApplier is at waitForIdle, it will hang and fail to shut down even if we already call shutdown on the _writerPool.

In fact, I don't think we need to shutdown the _writerPool when interrupting a recipient instance. We can have the oplog applier finish applying the current batch. And if the oplog applier is able to finish applying the current batch, it will stop on hitting _shouldStopApplying. Or if we get errors applying the current batch due to shutdown/stepdown, the oplog applier will also exit. So shutting down _writerPool during interrupt is unnecessary. And not shutting down _writerPool would also work around the bug mentioned above.



 Comments   
Comment by Githook User [ 17/Dec/20 ]

Author:

{'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}

Message: SERVER-53410: No need to shutdown writerPool when interrupting recipient service instances
Branch: master
https://github.com/mongodb/mongo/commit/78026dc407a66a7f4a642c6c6962f0dbd03f2dab

Generated at Thu Feb 08 05:30:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.