Details
-
Improvement
-
Resolution: Fixed
-
Trivial - P5
-
None
-
None
-
None
-
Fully Compatible
-
QE 2021-08-09
-
127
Description
There are two timing errors in out_max_time_ms.js that can occasionally lead to BFs. These two lines are highlighted here and marked with L2 and L3 below:
/* >>> L1: */const awaitShell = startParallelShell(shellStr, conn.port); |
|
|
/* >>> L2: */ waitForCurOpByFailPointNoNS(failPointConn.getDB("admin"), failPointName); |
|
|
/* >>> L3: */ assert.commandWorked(maxTimeMsConn.getDB("admin").runCommand( |
{configureFailPoint: "maxTimeNeverTimeOut", mode: "off"})); |
|
|
// The aggregation running in the parallel shell will hang on the failpoint, burning
|
// its time. Wait until the maxTimeMS has definitely expired.
|
sleep(maxTimeMS + 2000);
|
|
|
// Now drop the failpoint, allowing the aggregation to proceed. It should hit an
|
// interrupt check and terminate immediately.
|
assert.commandWorked(
|
failPointConn.getDB("admin").runCommand({configureFailPoint: failPointName, mode: "off"})); |
|
|
// Wait for the parallel shell to finish.
|
assert.eq(awaitShell(), 0);
|
L2 and L3 have a race condition with L1, which occurs rarely.
Suggested solution #1 to decrease the probability of getting into this BF again:
diff --git a/jstests/noPassthrough/out_max_time_ms.js b/jstests/noPassthrough/out_max_time_ms.js
|
index 36268ff645..0212c30a7e 100644
|
--- a/jstests/noPassthrough/out_max_time_ms.js
|
+++ b/jstests/noPassthrough/out_max_time_ms.js
|
@@ -34,7 +34,7 @@ function forceAggregationToHangAndCheckMaxTimeMsExpires(
|
// Use a short maxTimeMS so that the test completes in a reasonable amount of time. We will
|
// use the 'maxTimeNeverTimeOut' failpoint to ensure that the operation does not prematurely
|
// time out.
|
- const maxTimeMS = 1000 * 2;
|
+ const maxTimeMS = 1000 * 4;
|
|
// Enable a failPoint so that the write will hang.
|
const failpointCommand = {
|
@@ -66,6 +66,8 @@ function forceAggregationToHangAndCheckMaxTimeMsExpires(
|
shellStr += `(${runAggregate.toString()})();`;
|
const awaitShell = startParallelShell(shellStr, conn.port);
|
|
+ sleep(1000);
|
+
|
waitForCurOpByFailPointNoNS(failPointConn.getDB("admin"), failPointName);
|
|
assert.commandWorked(maxTimeMsConn.getDB("admin").runCommand(
|
@@ -73,7 +75,7 @@ function forceAggregationToHangAndCheckMaxTimeMsExpires(
|
|
// The aggregation running in the parallel shell will hang on the failpoint, burning
|
// its time. Wait until the maxTimeMS has definitely expired.
|
- sleep(maxTimeMS + 2000);
|
+ sleep(maxTimeMS + 4000);
|
|
// Now drop the failpoint, allowing the aggregation to proceed. It should hit an
|
// interrupt check and terminate immediately.
|
Suggested solution #2: improve L2 and L3 to make sure there wouldn't be any race conditions.
Attachments
Issue Links
- related to
-
SERVER-60586 out_max_time_ms.js does not correctly enable "maxTimeNeverTimeOut" failpoint leading to spurious test failure
-
- Closed
-