[SERVER-48068] assert.soon() with hang analyzer enabled inside a try/finally can lead to an invariant failure in ProgramRegistry Created: 08/May/20  Updated: 29/Oct/23  Resolved: 26/Jun/20

Status: Closed
Project: Core Server
Component/s: Shell, Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.7.0, 4.4.13

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Richard Samuels (Inactive)
Resolution: Fixed Votes: 0
Labels: tig-hanganalyzer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
is caused by SERVER-43153 Expose pids of spawned processes in t... Closed
Related
related to SERVER-49347 Fix locking in shell's ProgramRegistry Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Steps To Reproduce:

The sleep(5000) is to ensure the parallel shell has already exited by the time MongoRunner.runningChildPids() is called.

./mongo --nodb --eval 'var cleanup = startParallelShell("1", undefined, true); sleep(5000); try { MongoRunner.runningChildPids(); throw new Error('Simulating assert.soon() failure'); } finally { cleanup() }'

Sprint: STM 2020-06-29, STM 2020-07-13
Participants:
Linked BF Score: 50

 Description   

The MongoRunner.runningChildPids() function calls through to getRunningMongoChildProcessIds() in shell_utils_launcher.cpp, which in turn calls wait_for_pid() on all processes in ProgramRegistry::_registeredPids. A JavaScript test of the form

let cleanup;
try {
    cleanup = startParallelShell(...);
    assert.soon(...);
    ...
} finally {
    cleanup();
}

(for example) would end up calling wait_for_pid() multiple times. The first call to wait_for_pid() via getRunningMongoChildProcessIds() would unregister the pid from ProgramRegistry::_registeredPids and the second call would trigger this invariant.



 Comments   
Comment by Githook User [ 01/Feb/22 ]

Author:

{'name': 'Richard Samuels', 'email': 'richard.l.samuels@gmail.com', 'username': 'richardsamuels'}

Message: Revert "SERVER-48068 Fix handling of program registry and wait_for_pid in mongo shell"

This reverts commit dc354f0715784aff08d407f095eb9fd65217411f.
Branch: v4.4
https://github.com/mongodb/mongo/commit/f1aea9cfb4ac291a4f7f5cdd2a10bffdeb1eaa39

Comment by Githook User [ 31/Jan/22 ]

Author:

{'name': 'Richard Samuels', 'email': 'richard.samuels@mongodb.com', 'username': 'richardsamuels'}

Message: SERVER-48068 Fix handling of program registry and wait_for_pid in mongo shell

(cherry picked from commit 1af88ecaa952995d44e545708633196b2cd2bbb2)
Branch: v4.4
https://github.com/mongodb/mongo/commit/dc354f0715784aff08d407f095eb9fd65217411f

Comment by Githook User [ 26/Jun/20 ]

Author:

{'name': 'Richard Samuels', 'email': 'richard.samuels@mongodb.com', 'username': 'richardsamuels'}

Message: SERVER-48068 Fix handling of program registry and wait_for_pid in mongo shell
Branch: master
https://github.com/mongodb/mongo/commit/1af88ecaa952995d44e545708633196b2cd2bbb2

Comment by Max Hirschhorn [ 21/May/20 ]

In the general case, calling wait_for_pid() more than once is problematic because once waitpid() has returned the child process's exit code to the parent process (or the last Handle object has been closed on Windows), it becomes possible for that pid to be reused. In this particular case, we're expecting the mongo shell process to exit shortly after assert.soon() has failed. I think it is acceptable to do as you've proposed.

Comment by Brooke Miller [ 18/May/20 ]

Hey Max, Robert and I reviewed this. Does it seem appropriate to call a version of wait_for_pid() that doesn't un-register the pid?

Generated at Thu Feb 08 05:16:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.