[SERVER-56318] timeseries_sample.js can fail spuriously due to pseudorandomness Created: 23/Apr/21  Updated: 29/Oct/23  Resolved: 26/Apr/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

This can be reproduced reliably by applying the following patch to the test, and then running it locally with resmoke.py:

diff --git a/jstests/noPassthrough/timeseries_sample.js b/jstests/noPassthrough/timeseries_sample.js
index 71110731a0..d2dd6f773a 100644
--- a/jstests/noPassthrough/timeseries_sample.js
+++ b/jstests/noPassthrough/timeseries_sample.js
@@ -117,9 +117,12 @@ let runSampleTests = (measurementsPerBucket, backupPlanSelected) => {
     assertUniqueDocuments(result);
 
     // Check that we have executed the correct branch of the TrialStage.
-    const optimizedSamplePlan =
-        coll.explain("executionStats").aggregate([{$sample: {size: sampleSize}}]);
-    assertPlanForSample(optimizedSamplePlan, backupPlanSelected);
+    for (let i = 0; i < 5000; ++i) {
+        print("attempt number: " + i);
+        const optimizedSamplePlan =
+            coll.explain("executionStats").aggregate([{$sample: {size: sampleSize}}]);
+        assertPlanForSample(optimizedSamplePlan, backupPlanSelected);
+    }
 
     // Run an agg pipeline with optimization disabled.
     result = coll.aggregate([{$_internalInhibitOptimization: {}}, {$sample: {size: 1}}]).toArray();

I've never seen the loop added in this patch run more than 1000 times before the test fails, which is why I infer that the probability of the assertion failing is >0.1%.

Sprint: Query Execution 2021-05-03
Participants:
Linked BF Score: 28

 Description   

The test is inherently subject to randomness, since it is testing our random sampling implementation. It intends to make assertions based on the probably of an event being miniscule. However, this assertion can fail with non-negligible probability. I've shown experimentally that the probability of this assertion failing strictly due to randomness is >0.1%. Since this test will indeed run thousands of times, the probably of failure needs to be many orders of magnitude lower.

In order to pass as currently written, the ARHASH algorithm needs to obtain 5 valid samples in 100 iterations. The buckets are 1% full, so the likelihood of a single iteration obtaining a valid document is ~1%. Getting 5 hits in 100 attempts is apparently not as unlikely as it needs to be!



 Comments   
Comment by Githook User [ 26/Apr/21 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}

Message: SERVER-56318 Use a larger sample size to ensure backup plan is selected in timeseries_sample.js
Branch: master
https://github.com/mongodb/mongo/commit/53d8ebc6100c0d96ed14b523e8d00a17fee2a375

Generated at Thu Feb 08 05:38:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.