-
Type: Improvement
-
Resolution: Fixed
-
Priority: Trivial - P5
-
Affects Version/s: None
-
Component/s: None
-
None
-
Fully Compatible
-
QE 2021-08-09
-
127
There are two timing errors in out_max_time_ms.js that can occasionally lead to BFs. These two lines are highlighted here and marked with L2 and L3 below:
/* >>> L1: */const awaitShell = startParallelShell(shellStr, conn.port); /* >>> L2: */ waitForCurOpByFailPointNoNS(failPointConn.getDB("admin"), failPointName); /* >>> L3: */ assert.commandWorked(maxTimeMsConn.getDB("admin").runCommand( {configureFailPoint: "maxTimeNeverTimeOut", mode: "off"})); // The aggregation running in the parallel shell will hang on the failpoint, burning // its time. Wait until the maxTimeMS has definitely expired. sleep(maxTimeMS + 2000); // Now drop the failpoint, allowing the aggregation to proceed. It should hit an // interrupt check and terminate immediately. assert.commandWorked( failPointConn.getDB("admin").runCommand({configureFailPoint: failPointName, mode: "off"})); // Wait for the parallel shell to finish. assert.eq(awaitShell(), 0);
L2 and L3 have a race condition with L1, which occurs rarely.
Suggested solution #1 to decrease the probability of getting into this BF again:
diff --git a/jstests/noPassthrough/out_max_time_ms.js b/jstests/noPassthrough/out_max_time_ms.js index 36268ff645..0212c30a7e 100644 --- a/jstests/noPassthrough/out_max_time_ms.js +++ b/jstests/noPassthrough/out_max_time_ms.js @@ -34,7 +34,7 @@ function forceAggregationToHangAndCheckMaxTimeMsExpires( // Use a short maxTimeMS so that the test completes in a reasonable amount of time. We will // use the 'maxTimeNeverTimeOut' failpoint to ensure that the operation does not prematurely // time out. - const maxTimeMS = 1000 * 2; + const maxTimeMS = 1000 * 4; // Enable a failPoint so that the write will hang. const failpointCommand = { @@ -66,6 +66,8 @@ function forceAggregationToHangAndCheckMaxTimeMsExpires( shellStr += `(${runAggregate.toString()})();`; const awaitShell = startParallelShell(shellStr, conn.port); + sleep(1000); + waitForCurOpByFailPointNoNS(failPointConn.getDB("admin"), failPointName); assert.commandWorked(maxTimeMsConn.getDB("admin").runCommand( @@ -73,7 +75,7 @@ function forceAggregationToHangAndCheckMaxTimeMsExpires( // The aggregation running in the parallel shell will hang on the failpoint, burning // its time. Wait until the maxTimeMS has definitely expired. - sleep(maxTimeMS + 2000); + sleep(maxTimeMS + 4000); // Now drop the failpoint, allowing the aggregation to proceed. It should hit an // interrupt check and terminate immediately.
Suggested solution #2: improve L2 and L3 to make sure there wouldn't be any race conditions.
- related to
-
SERVER-60586 out_max_time_ms.js does not correctly enable "maxTimeNeverTimeOut" failpoint leading to spurious test failure
- Closed