-
Type: New Feature
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Testing Infrastructure
-
Fully Compatible
-
Sharding 2017-09-11, Sharding 2017-10-02
The concurrency framework uses ShardingTest in cluster.js to manage its sharded cluster for running FSM workloads against. We should use the stepdown_thread.js override to make it so that the CSRS primary fails over periodically while FSM workloads are running.
- The "stepdownOptions" key should be added as a valid cluster option. The validateClusterOptions() function should be updated accordingly.
- A new method shouldPerformContinuousStepdowns() should be added to the Cluster object. It should return clusterOptions.stepdownOptions !== undefined && clusterOptions.sharded.enabled.
- If Cluster#shouldPerformContinuousStepdowns() returns true, then cluster.js should load the continuous_stepdown.js file and call ContinuousStepdown.configure(clusterOptions.stepdownOptions) to add the methods defined in
SERVER-30675to ReplSetTest and ShardingTest. - Additionally, if Cluster#shouldPerformContinuousStepdowns() returns true, then two new methods startContinuousFailover() and stopContinuousFailover() should be added to the Cluster object in order to be able to suspend the stepdown threads and restart them between FSM workload groups. Their implementations should call the corresponding methods on ShardingTest.
The runWorkloadGroup() function in runner.js should be updated as follows to conditionally start the stepdown threads:
diff --git a/jstests/concurrency/fsm_libs/runner.js b/jstests/concurrency/fsm_libs/runner.js index ec629245da..3cab3ca4d4 100644 --- a/jstests/concurrency/fsm_libs/runner.js +++ b/jstests/concurrency/fsm_libs/runner.js @@ -532,18 +532,30 @@ var runner = (function() { cleanup.push(workload); }); + if (cluster.shouldPerformContinuousStepdowns()) { + cluster.startContinuousFailover(); + } + try { - // Start this set of foreground workload threads. - threadMgr.spawnAll(cluster, executionOptions); - // Allow 20% of foreground threads to fail. This allows the workloads to run on - // underpowered test hosts. - threadMgr.checkFailed(0.2); + try { + // Start this set of foreground workload threads. + threadMgr.spawnAll(cluster, executionOptions); + // Allow 20% of foreground threads to fail. This allows the workloads to run on + // underpowered test hosts. + threadMgr.checkFailed(0.2); + } finally { + // Threads must be joined before destruction, so do this + // even in the presence of exceptions. + errors.push(...threadMgr.joinAll().map( + e => new WorkloadFailure( + e.err, e.stack, e.tid, 'Foreground ' + e.workloads.join(' ')))); + } } finally { - // Threads must be joined before destruction, so do this - // even in the presence of exceptions. - errors.push(...threadMgr.joinAll().map( - e => new WorkloadFailure( - e.err, e.stack, e.tid, 'Foreground ' + e.workloads.join(' ')))); + if (cluster.shouldPerformContinuousStepdowns()) { + // Suspend the stepdown threads prior to calling cleanupWorkload() to avoid + // causing a failover to happen while the data consistency checks are running. + cluster.stopContinuousFailover(); + } } } finally { // Call each foreground workload's teardown function. After all teardowns have completed
Note: While the changes should be tested locally and in an Evergreen patch build with something a patch similar to the following, the actual work to add new resmoke.py YAML suites and Evergreen tasks for running under this configuration will happen under a follow-up SERVER ticket.
diff --git a/jstests/concurrency/fsm_all_sharded_replication.js b/jstests/concurrency/fsm_all_sharded_replication.js index 66de8c45ff..3304f01b49 100644 --- a/jstests/concurrency/fsm_all_sharded_replication.js +++ b/jstests/concurrency/fsm_all_sharded_replication.js @@ -96,7 +96,11 @@ var blacklist = [ return dir + '/' + file; }); -runWorkloadsSerially(ls(dir).filter(function(file) { - return !Array.contains(blacklist, file); -}), - {sharded: {enabled: true}, replication: {enabled: true}}); +runWorkloadsSerially( + ls(dir).filter(function(file) { + return !Array.contains(blacklist, file); + }), + { + sharded: {enabled: true, stepdownOptions: {configStepdown: true, shardStepdown: false}}, + replication: {enabled: true} + });
- depends on
-
SERVER-30675 Add configuration options to JavaScript stepdown thread
- Closed
- is depended on by
-
SERVER-30677 Run the concurrency suite with CSRS primary stepdowns
- Closed