[SERVER-34555] Migrate concurrency_sharded_with_stepdowns{,_and_balancer}.yml test suites to run directly via resmoke.py Created: 18/Apr/18  Updated: 29/Oct/23  Resolved: 31/May/18

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.0.0-rc3, 4.1.1

Type: Task Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Jonathan Abrahams
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-35051 Resmoke should stop the balancer befo... Closed
Problem/Incident
causes SERVER-36169 Resmoke: bare raise outside except in... Closed
Related
related to SERVER-41096 ContinuousStepdown thread and resmoke... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v3.6
Sprint: TIG 2018-05-07, TIG 2018-05-21, TIG 2018-06-04, TIG 2018-06-18
Participants:
Linked BF Score: 12
Story Points: 5

 Description   

The changes from SERVER-19630 make it so FSM workloads run as individual test cases in the concurrency_sharded_causal_consistency{,_and_balancer}.yml and concurrency_sharded_replication{,_and_balancer}.yml test suites. The concurrency_sharded_with_stepdowns{,_and_balancer}.yml test suites weren't migrated to the new-style because there are parts of setting up the environment to run the FSM workloads under that aren't prepared to have the primary of the CSRS or replica set shard stepped down. Rather than trying to get the all the retry logic correct (e.g. by handling the ManualInterventionRequired when attempting to shard the collection), we should instead delay when resmoke.py's StepdownThread actually runs after the FSM workload has started.

A sketch of the interactions between the _StepdownThread class and resmoke_runner.js via the filesystem is described in the appropriate place of the runWorkloads() function below.

diff --git a/jstests/concurrency/fsm_libs/resmoke_runner.js b/jstests/concurrency/fsm_libs/resmoke_runner.js
index d94fd4e31c..af0afca2bb 100644
--- a/jstests/concurrency/fsm_libs/resmoke_runner.js
+++ b/jstests/concurrency/fsm_libs/resmoke_runner.js
@@ -104,6 +104,15 @@
                 cleanup.push(workload);
             });
 
+            // After the $config.setup() function has been called, it is safe for the stepdown
+            // thread to start running. The main thread won't attempt to interact with the cluster
+            // until all of the spawned worker threads have finished.
+            //
+            // TODO: Call writeFile('./stepdown_permitted', '') function to indicate that the
+            // stepdown thread can run. It is unnecessary for the stepdown thread to indicate that
+            // it is going to start running because it will eventually after the worker threads have
+            // started.
+
             // Since the worker threads may be running with causal consistency enabled, we set the
             // initial clusterTime and initial operationTime for the sessions they'll create so that
             // they are guaranteed to observe the effects of the workload's $config.setup() function
@@ -128,17 +137,34 @@
             }
 
             try {
-                // Start this set of worker threads.
-                threadMgr.spawnAll(cluster, executionOptions);
-                // Allow 20% of the threads to fail. This allows the workloads to run on
-                // underpowered test hosts.
-                threadMgr.checkFailed(0.2);
+                try {
+                    // Start this set of worker threads.
+                    threadMgr.spawnAll(cluster, executionOptions);
+                    // Allow 20% of the threads to fail. This allows the workloads to run on
+                    // underpowered test hosts.
+                    threadMgr.checkFailed(0.2);
+                } finally {
+                    // Threads must be joined before destruction, so do this even in the presence of
+                    // exceptions.
+                    errors.push(...threadMgr.joinAll().map(
+                        e => new WorkloadFailure(
+                            e.err, e.stack, e.tid, 'Foreground ' + e.workloads.join(' '))));
+                }
             } finally {
-                // Threads must be joined before destruction, so do this even in the presence of
-                // exceptions.
-                errors.push(...threadMgr.joinAll().map(
-                    e => new WorkloadFailure(
-                        e.err, e.stack, e.tid, 'Foreground ' + e.workloads.join(' '))));
+                // Until we are guaranteed that the stepdown thread isn't running, it isn't safe for
+                // the $config.teardown() function to be called. We should signal to resmoke.py that
+                // the stepdown thread should stop running and wait for the stepdown thread to
+                // signal that it has stopped.
+                //
+                // TODO: Call removeFile('./stepdown_permitted') so the next time the stepdown
+                // thread checks to see if it should keep running that it instead stops stepping
+                // down the cluster and creates a file named "./stepdown_off".
+                //
+                // TODO: Call the ls() function inside of an assert.soon() / assert.soonNoExcept()
+                // and wait for the "./stepdown_off" file to be created. assert.soonNoExcept()
+                // should probably be used so that an I/O-related error from attempting to list the
+                // contents of the directory while the file is being created doesn't lead to a
+                // JavaScript exception that causes the test to fail.
             }
         } finally {
             // Call each workload's teardown function. After all teardowns have completed check if



 Comments   
Comment by Githook User [ 05/Jun/18 ]

Author:

{'username': 'hptabster', 'name': 'Jonathan Abrahams', 'email': 'jonathan@mongodb.com'}

Message: SERVER-34555 Blacklist jstests/concurrency/fsm_workloads/group*.js in the concurreny sharded stepdowns suites
Branch: v4.0
https://github.com/mongodb/mongo/commit/4717757f21d8700a48b645e75eaffda6ef62a432

Comment by Githook User [ 04/Jun/18 ]

Author:

{'username': 'hptabster', 'name': 'Jonathan Abrahams', 'email': 'jonathan@mongodb.com'}

Message: SERVER-34555 Add stepdown to FSM resmoke integration

(cherry picked from commit 2b10b06044dfaaf5b9c37f4379521f14e9bdb0e5)
Branch: v4.0
https://github.com/mongodb/mongo/commit/bf72dbc9922412b01c0e4d2f485338aa7ae9b76c

Comment by Githook User [ 31/May/18 ]

Author:

{'username': 'hptabster', 'name': 'Jonathan Abrahams', 'email': 'jonathan@mongodb.com'}

Message: SERVER-34555 Add stepdown to FSM resmoke integration
Branch: master
https://github.com/mongodb/mongo/commit/2b10b06044dfaaf5b9c37f4379521f14e9bdb0e5

Generated at Thu Feb 08 04:37:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.