[SERVER-30676] Add support for stepdown options to the concurrency framework Created: 16/Aug/17  Updated: 30/Oct/23  Resolved: 08/Sep/17

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.5.13

Type: New Feature Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Jack Mulrow
Resolution: Fixed Votes: 0
Labels: sharding36-passthrough-testing
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-30675 Add configuration options to JavaScri... Closed
is depended on by SERVER-30677 Run the concurrency suite with CSRS p... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2017-09-11, Sharding 2017-10-02
Participants:

 Description   

The concurrency framework uses ShardingTest in cluster.js to manage its sharded cluster for running FSM workloads against. We should use the stepdown_thread.js override to make it so that the CSRS primary fails over periodically while FSM workloads are running.

  • The "stepdownOptions" key should be added as a valid cluster option. The validateClusterOptions() function should be updated accordingly.
  • A new method shouldPerformContinuousStepdowns() should be added to the Cluster object. It should return clusterOptions.stepdownOptions !== undefined && clusterOptions.sharded.enabled.
  • If Cluster#shouldPerformContinuousStepdowns() returns true, then cluster.js should load the continuous_stepdown.js file and call ContinuousStepdown.configure(clusterOptions.stepdownOptions) to add the methods defined in SERVER-30675 to ReplSetTest and ShardingTest.
  • Additionally, if Cluster#shouldPerformContinuousStepdowns() returns true, then two new methods startContinuousFailover() and stopContinuousFailover() should be added to the Cluster object in order to be able to suspend the stepdown threads and restart them between FSM workload groups. Their implementations should call the corresponding methods on ShardingTest.

The runWorkloadGroup() function in runner.js should be updated as follows to conditionally start the stepdown threads:

diff --git a/jstests/concurrency/fsm_libs/runner.js b/jstests/concurrency/fsm_libs/runner.js
index ec629245da..3cab3ca4d4 100644
--- a/jstests/concurrency/fsm_libs/runner.js
+++ b/jstests/concurrency/fsm_libs/runner.js
@@ -532,18 +532,30 @@ var runner = (function() {
                 cleanup.push(workload);
             });
 
+            if (cluster.shouldPerformContinuousStepdowns()) {
+                cluster.startContinuousFailover();
+            }
+
             try {
-                // Start this set of foreground workload threads.
-                threadMgr.spawnAll(cluster, executionOptions);
-                // Allow 20% of foreground threads to fail. This allows the workloads to run on
-                // underpowered test hosts.
-                threadMgr.checkFailed(0.2);
+                try {
+                    // Start this set of foreground workload threads.
+                    threadMgr.spawnAll(cluster, executionOptions);
+                    // Allow 20% of foreground threads to fail. This allows the workloads to run on
+                    // underpowered test hosts.
+                    threadMgr.checkFailed(0.2);
+                } finally {
+                    // Threads must be joined before destruction, so do this
+                    // even in the presence of exceptions.
+                    errors.push(...threadMgr.joinAll().map(
+                        e => new WorkloadFailure(
+                            e.err, e.stack, e.tid, 'Foreground ' + e.workloads.join(' '))));
+                }
             } finally {
-                // Threads must be joined before destruction, so do this
-                // even in the presence of exceptions.
-                errors.push(...threadMgr.joinAll().map(
-                    e => new WorkloadFailure(
-                        e.err, e.stack, e.tid, 'Foreground ' + e.workloads.join(' '))));
+                if (cluster.shouldPerformContinuousStepdowns()) {
+                    // Suspend the stepdown threads prior to calling cleanupWorkload() to avoid
+                    // causing a failover to happen while the data consistency checks are running.
+                    cluster.stopContinuousFailover();
+                }
             }
         } finally {
             // Call each foreground workload's teardown function. After all teardowns have completed

Note: While the changes should be tested locally and in an Evergreen patch build with something a patch similar to the following, the actual work to add new resmoke.py YAML suites and Evergreen tasks for running under this configuration will happen under a follow-up SERVER ticket.

diff --git a/jstests/concurrency/fsm_all_sharded_replication.js b/jstests/concurrency/fsm_all_sharded_replication.js
index 66de8c45ff..3304f01b49 100644
--- a/jstests/concurrency/fsm_all_sharded_replication.js
+++ b/jstests/concurrency/fsm_all_sharded_replication.js
@@ -96,7 +96,11 @@ var blacklist = [
     return dir + '/' + file;
 });
 
-runWorkloadsSerially(ls(dir).filter(function(file) {
-    return !Array.contains(blacklist, file);
-}),
-                     {sharded: {enabled: true}, replication: {enabled: true}});
+runWorkloadsSerially(
+    ls(dir).filter(function(file) {
+        return !Array.contains(blacklist, file);
+    }),
+    {
+      sharded: {enabled: true, stepdownOptions: {configStepdown: true, shardStepdown: false}},
+      replication: {enabled: true}
+    });



 Comments   
Comment by Ramon Fernandez Marina [ 08/Sep/17 ]

Author:

{'username': u'jsmulrow', 'name': u'Jack Mulrow', 'email': u'jack.mulrow@mongodb.com'}

Message:SERVER-30676 Add support for stepdown options to the concurrency framework
Branch:master
https://github.com/mongodb/mongo/commit/27a5f280a09165cd39a3fe35cf5cb4fe2372f318

Generated at Thu Feb 08 04:24:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.