[SERVER-31013] Make serverExitCodeMap useful to detect if server crashed on startup before connection established Created: 08/Sep/17  Updated: 30/Oct/23  Resolved: 18/Apr/18

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: 3.5.12
Fix Version/s: 3.6.6, 3.7.6

Type: Task Priority: Major - P3
Reporter: Randolph Tan Assignee: Robert Guo (Inactive)
Resolution: Fixed Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
Related
is related to SERVER-27549 Log a message in MongoRunner.stopMong... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v3.6
Sprint: TIG 2018-04-23
Participants:
Linked BF Score: 56

 Description   

It is possible for the same pid to be assigned to processes spawned at different points in time during a test. The way serverExitCodeMap is implemented makes it so that checking the exit code of the process in MongoRunner.stopMongod() may see the wrong exit code for the process which was just spawned.


Original description

MongoRunner.stopMongod should explicitly check for exit code 100 on instances where it is expected to fail with non-zero exit code.



 Comments   
Comment by Githook User [ 19/Jun/18 ]

Author:

{'username': 'guoyr', 'name': 'Robert Guo', 'email': 'robert.guo@10gen.com'}

Message: SERVER-31013 use port instead of pid for serverExitCodeMap
Branch: v3.6
https://github.com/mongodb/mongo/commit/85998865b76fbca935a29fcea14409780945db6e

Comment by Githook User [ 19/Apr/18 ]

Author:

{'email': 'robert.guo@10gen.com', 'username': 'guoyr', 'name': 'Robert Guo'}

Message: SERVER-31013 fix exit code for Windows
Branch: master
https://github.com/mongodb/mongo/commit/8652fc54ad19affc3200a9e532dddb45628b60ef

Comment by Githook User [ 18/Apr/18 ]

Author:

{'email': 'robert.guo@10gen.com', 'username': 'guoyr', 'name': 'Robert Guo'}

Message: SERVER-31013 use port instead of pid for serverExitCodeMap
Branch: master
https://github.com/mongodb/mongo/commit/378796ec67c575c628b80d34e285dc6fd110a48f

Comment by Max Hirschhorn [ 30/Mar/18 ]

Based on Samy's comment, I think we might be able to make the serverExitCodeMap useful for being able to check the return code of a mongod or mongos process that fails to start up by changing the key from the pid of the process to the port of the process. The issue with using the pid is that while unique for that instance of attempting to spawn the process, the caller of MongoRunner.runMongod() isn't able to learn of the pid because the function returns null rather than a Mongo connection object.

Comment by Samyukta Lanka [ 30/Mar/18 ]

In the linked BF-8591, replTest.start is failing because the node fasserts before start is called and waitForConnect was set to true in SERVER-26601. This happens in three places:
https://github.com/mongodb/mongo/blob/f1bb8e1389d6aa458e5100f3eb753b80b1a4e4e7/jstests/replsets/rollback_too_new.js#L58
https://github.com/mongodb/mongo/blob/f1bb8e1389d6aa458e5100f3eb753b80b1a4e4e7/jstests/replsets/rollback_collMod_fatal.js#L66
https://github.com/mongodb/mongo/blob/f1bb8e1389d6aa458e5100f3eb753b80b1a4e4e7/jstests/replsets/rollback_cmd_unrollbackable.js#L77

It can be fixed by surrounding those lines in a try/catch:

try {
    c_conn = replTest.start(CID, {waitForConnect: true}, true /*restart*/);
} catch(e) {
}

Comment by Max Hirschhorn [ 22/Mar/18 ]

From Max's description in the linked BF-6118 it looks like MongoRunner should explicitly ignore code 100 always due to the way PID/Port mapping is tracked. Passing this ticket on to TIG.

I don't think we should have MongoRunner always ignore an exit code of 100 as it undermines the ability to have tests explicitly state which processes should exit with which return codes. Looking back at some of the discussion in SERVER-27549, I think the serverExitCodeMap misguided because MongoRunner.runMongod() returns null if we cannot establish a connection to the server before it exits. Returning null means that the caller never learns what the pid is and therefore cannot call MongoRunner.stopMongod() to assert on the exit code. Relying on the "Could not start mongo program ... process ended" and "MongoDB process ... intentionally exited with error code" messages seem to good enough to me of indicating that a preceeding backtrace may not be the reason the test failed.

MongoRunner.stopMongod = function(conn, signal, opts) {
    if (!conn.pid) {
        throw new Error("first arg must have a `pid` property; " +
                        "it is usually the object returned from MongoRunner.runMongod/s");
    }
 
    if (!conn.port) {
        throw new Error("first arg must have a `port` property; " +
                        "it is usually the object returned from MongoRunner.runMongod/s");
    }
 
    ...
}

Comment by Kaloian Manassiev [ 21/Mar/18 ]

From Max's description in the linked BF-6118 it looks like MongoRunner should explicitly ignore code 100 always due to the way PID/Port mapping is tracked. Passing this ticket on to TIG.

Generated at Thu Feb 08 04:25:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.