If we expect tests to disable their own failpoints, then any test using a failpoint that could lead to a shutdown hang will have to surround the entire test with a try-finally statement.
It seems like SERVER-27280 was proposing that failpoints should automatically check if the server is shutting down. I'm not sure how that proposal will turn out.
I think a better design, if possible, would be to automatically wrap tests in try-finally blocks and disable any problematic failpoints on failure.
It does tend to be very complicated to write a robust try/catch/finally block. Inlined below is an example of how we handle {fsync: 1, lock} and {fsyncUnlock: 1} in ReplSetTest. I'd be interested in only needing write such a structure once and having other tests able to take advantage of it. A Promise-like construct would potentially make the try/catch/finally easier to express in the test, but I haven't thought deeply enough about it.
var activeException = false;
|
|
try {
|
// Lock the primary to prevent the TTL monitor from deleting expired documents in
|
// the background while we are getting the dbhashes of the replica set members.
|
assert.commandWorked(primary.adminCommand({fsync: 1, lock: 1}),
|
'failed to lock the primary');
|
this.awaitReplication(60 * 1000 * 5);
|
checkerFunction.apply(this, checkerFunctionArgs);
|
} catch (e) {
|
activeException = true;
|
throw e;
|
} finally {
|
// Allow writes on the primary.
|
var res = primary.adminCommand({fsyncUnlock: 1});
|
|
if (!res.ok) {
|
var msg = 'failed to unlock the primary, which may cause this' +
|
' test to hang: ' + tojson(res);
|
if (activeException) {
|
print(msg);
|
} else {
|
throw new Error(msg);
|
}
|
}
|
}
|
|
|
If we expect tests to disable their own failpoints, then any test using a failpoint that could lead to a shutdown hang will have to surround the entire test with a try-finally statement. Is that a design pattern we want in our tests? I think a better design, if possible, would be to automatically wrap tests in try-finally blocks and disable any problematic failpoints on failure.
|
|
judah.schvimer, why shouldn't it be the test's responsibility to disable the failpoint if it enabled the failpoint? If you wanted the failpoint to be disabled before the mongo shell terminated the spawned processes, then the portion of the test after the failpoint was enabled should be wrapped in a try/finally block.
The mongo shell currently doesn't expose a way to inject additional logic into its "atexit" handler, and I could see situations where attempting to communicate with a MongoDB process when the test has failed could lead to the mongo shell hanging rather than exiting cleanly with a non-zero return code (as it would have done before attempting to disable failpoints automatically). If we decide to pursue this idea, I'd like to ensure that it also handles the fsync+lock scenario described in SERVER-20773.
|