[SERVER-65979] Fix issue with python multiprocess shutdown Created: 26/Apr/22 Updated: 21/Aug/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Shreyas Kalyan | Assignee: | Backlog - Server Tooling and Methods (STM) (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Server Tooling & Methods
|
||||
| Operating System: | ALL | ||||
| Participants: | |||||
| Linked BF Score: | 35 | ||||
| Description |
|
In BF-24933, when the mongod crashes and the test runner treats it as a failure (starts up the hang analyzer), the hang analyzer tries to kill the process using proc.kill() which sends a SIGKILL to the pyKMIP process. However, PyKMIP uses multiprocessing and has created two children processes, which the hang analyzer fails to kill. We need to modify the call to proc.kill to first search for child processes and then kill the parent process. |
| Comments |
| Comment by Iryna Zhuravlova [ 03/Jun/22 ] |
|
Hi shreyas.kalyan@mongodb.com Thank you for tracking down this issue and taking the time to explain it in more details. Sure, feel free to write up a quick PR. |
| Comment by Shreyas Kalyan [ 10/May/22 ] |
|
iryna.zhuravlova@mongodb.com I apologize, I did not fully explain the issue that was occurring in the BF ticket. This issue is not causing the BF. The BF has been fixed by The issue that we are observing here is that there is a process that we are starting from our JS tests called PyKMIP. PyKMIP spawns its own threads. Now if the test hangs for whatever reason (sometimes there is an issue and we need the hang analyzer to kick in), the hang analyzer will kick in and perform its tasks, the last of which involves cleaning up all processes. Now when cleaning up the processes, the analyzer fails to kill the child processes of PyKMIP on macOS. Because the child processes are still running, resmoke fails to exit and hangs for 2 hours until the task times out and kills processes, children and all. What I want from this ticket is to not wait the 2 hours because that just wastes compute time on the mac machines that could be better optimized to be running other tasks. The line I want to change is here. I could write up a quick PR if that would be helpful? |
| Comment by Iryna Zhuravlova [ 09/May/22 ] |
|
shreyas.kalyan@mongodb.com The hang analyzer is not optimized to process the python processes. Can we increase the timeout on MacOS instead? |