[SERVER-43256] Fix incorrect uses of assert.soon and make hang-analysis call exit Created: 10/Sep/19  Updated: 29/Oct/23  Resolved: 31/Oct/19

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: Backlog
Fix Version/s: 4.3.1

Type: New Feature Priority: Major - P3
Reporter: Ryan Timmons Assignee: Robert Guo (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-26867 Timeout-related assert.soon failures ... Closed
Backwards Compatibility: Fully Compatible
Sprint: STM 2019-10-31
Participants:
Story Points: 3

 Description   

A number of places use assert.soon as a retry mechanism (e.g. ssl_test.js), and at least one test (index_delete.js) runs assert.soon in a try/finally block to capture better error messages.

↑ looks like this:

    try {
        assert.soon(function() {
            return checkProgram(serverPID).alive &&
                (0 === _runMongoProgram.apply(null, clientArgv));
        }, "connect failed", connectTimeoutMillis);
    } catch (ex) {
        return false;
    } finally {
        _stopMongoProgram(this.port);
    }

In the cases where assert.soon fails due to timeout, we want to instead run hang-analysis and exit rather than returning control to the caller.

(Comment from Robert: I think assert.soon almost always fails due to a timeout. What's the reason for exiting in this case?)

Uses of assert.soon are pervasive in jstests so do a best-effort fix here. Could modify assert.soon in a patch-build to throw immediately and any test that doesn't fail is likely using it incorrectly.

If there are more than (say) 3-4 cases of using assert.soon as a retry mechanism, create a helper assert.retry method (name/location tbd) that has a similar signature to assert.soon but doesn't call hang-analyzer or barf if the callback is never truthy and instead returns a [success, error] array where success is the last result of the callback and error is any errors thrown by the callback. (Exact signature tbd depending on how it's used by callers of extant assert.soon.)

Finally, once this is done, modify existing callers of the hang-analyzer (probably just assert.soon) to call exit after running hang-analysis.



 Comments   
Comment by Githook User [ 31/Oct/19 ]

Author:

{'name': 'Robert Guo', 'email': 'robert.guo@mongodb.com'}

Message: SERVER-43256 When assert.soon() is expected to fail, don't call the hang analyzer
Branch: master
https://github.com/mongodb/mongo/commit/c046a5896652acea84c9db1d9346a43b2745a37b

Comment by Ryan Timmons [ 25/Oct/19 ]

Be suspicious any time there's an assert inside of an assert.throws e.g. here

Comment by Ryan Timmons [ 24/Oct/19 ]

A patch-test of calling hang-analyzer from assert.js is here. Most of the reds are a result of the code in my last comment.

Comment by Ryan Timmons [ 24/Oct/19 ]

One particularly obvious case of using assert.soon incorrectly is here:
https://github.com/mongodb/mongo/blob/c119ef45e3f1c2cc7f36c7b81d0731e332461c2b/jstests/replsets/log_secondary_oplog_application.js#L41-L44
The checkLog.contains function relies on assert.soon to see if a message is in the log. This test is trying to assert that a message is not in the logs, so it asserts that the assert.soon throws an exception. So if the hang-analyzer takes a while there could be an election and the rest of the test fails... Solution for this particular case is to just sleep for the timeout that assert.soon would normally wait throughout its retries and then call checkContainsOnce and assert it's false

Comment by Max Hirschhorn [ 11/Sep/19 ]

Finally, once this is done, modify existing callers of the hang-analyzer (probably just assert.soon) to call exit after running hang-analysis.

I would think we'd still want the mongo shell to log the JavaScript stacktrace along with the "failed to load" message picked up by the Build Baron tooling of where the exception from assert.soon() came from.

Generated at Thu Feb 08 05:02:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.