[SERVER-34548] Make FSM workloads able to be run via burn_in_tests.py (with --repeat=2) Created: 18/Apr/18  Updated: 16/May/18  Resolved: 16/May/18

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Vesselina Ratcheva (Inactive) Assignee: DO NOT USE - Backlog - Test Infrastructure Group (TIG)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-30204 Create resmoke.py hook that drops all... Closed
Operating System: ALL
Participants:
Story Points: 3

 Description   

Individual FSM workloads are not designed to clean up after themselves - rather, they expect the runners to take care of that. This can be problematic when you create or modify a workload, as that is picked up by burn_in_tests, which runs that workload several times without evident cleanup between runs. As a result, that test can conflict with itself (e.g. trying to create a database that already exists by the second run).



 Comments   
Comment by Max Hirschhorn [ 16/May/18 ]

I've verified locally that the following resmoke.py invocation now succeeds; the changes from SERVER-30204 have made it so that running with --repeat=2 no longer fails for FSM workloads that use their own unique database name. I'm closing this ticket as a duplicate.

python buildscripts/resmoke.py --suites=concurrency jstests/concurrency/fsm_workloads/create_database.js --repeat=2

Comment by Robert Guo (Inactive) [ 26/Apr/18 ]

max.hirschhorn Running an additional validation sounds good before SERVER-30204 is implemented. For the dropAllDatabases() call, should we put int in runner.js instead? I think it would allow us to start removing drops from wrokload teardowns without having to port over all of the fsm suites to resmoke first.

Comment by Max Hirschhorn [ 18/Apr/18 ]

burn_in_tests.py failing when running the create_database.js FSM workload can be explained by how it doesn't define a $config.teardown function to clean up its unique database name. The intent for how FSM workloads are written after SERVER-30204 is implemented is that they shouldn't need to and would be discouraged from dropping any collections or databases at the end of the test because it'd undermine our ability to verify the data is consistent.

robert.guo, I hadn't thought too deeply about this while working on SERVER-30203 but we've lost coverage for running the data consistency checks before $config.teardown is called.

diff --git a/jstests/concurrency/fsm_libs/resmoke_runner.js b/jstests/concurrency/fsm_libs/resmoke_runner.js
index d94fd4e31c..bacd2070ae 100644
--- a/jstests/concurrency/fsm_libs/resmoke_runner.js
+++ b/jstests/concurrency/fsm_libs/resmoke_runner.js
@@ -19,6 +19,24 @@
     function cleanupWorkload(workload, context, cluster, errors, header) {
         const phase = 'before workload ' + workload + ' teardown';
 
+        try {
+            // Ensure that all data has replicated correctly to the secondaries before calling the
+            // workload's teardown method.
+            cluster.checkReplicationConsistency([], phase);
+        } catch (e) {
+            errors.push(new WorkloadFailure(
+                e.toString(), e.stack, 'main', header + ' checking consistency on secondaries'));
+            return false;
+        }
+
+        try {
+            cluster.validateAllCollections(phase);
+        } catch (e) {
+            errors.push(new WorkloadFailure(
+                e.toString(), e.stack, 'main', header + ' validating collections'));
+            return false;
+        }
+
         try {
             teardownWorkload(workload, context, cluster);
         } catch (e) {

As far as adding the issue Vessy is running into, what do you think about having resmoke_runner.js call dropAllDatabases() immediately after setting up the cluster?

Comment by Max Hirschhorn [ 18/Apr/18 ]

vesselina.ratcheva, could you provide a link to the patch build where you saw the burn_in_tests task fail? It should in general be possible to run FSM workloads with --repeat > 1. The concurrency framework drops the collection before starting to run the FSM workload as part of the prepareCollections() function.

I would also expect that FSM workloads which create their own collection or database (where the FSM workload name is used as a unique identifier) to currently drop those collections in their $config.teardown function. The intent is to address this in SERVER-30204 by having a version of the CleanEveryN that doesn't require restarting the entire MongoDB deployment after every FSM workload.

If there's something I'm missing and burn_in_tests.py cannot run FSM workloads multiple times, we should be able to add jstests/concurrency/fsm_workloads/*.js to the "exclude_files" section of burn_in_tests.yml.

Generated at Thu Feb 08 04:37:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.