Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30676

Add support for stepdown options to the concurrency framework

    • Fully Compatible
    • Sharding 2017-09-11, Sharding 2017-10-02

      The concurrency framework uses ShardingTest in cluster.js to manage its sharded cluster for running FSM workloads against. We should use the stepdown_thread.js override to make it so that the CSRS primary fails over periodically while FSM workloads are running.

      • The "stepdownOptions" key should be added as a valid cluster option. The validateClusterOptions() function should be updated accordingly.
      • A new method shouldPerformContinuousStepdowns() should be added to the Cluster object. It should return clusterOptions.stepdownOptions !== undefined && clusterOptions.sharded.enabled.
      • If Cluster#shouldPerformContinuousStepdowns() returns true, then cluster.js should load the continuous_stepdown.js file and call ContinuousStepdown.configure(clusterOptions.stepdownOptions) to add the methods defined in SERVER-30675 to ReplSetTest and ShardingTest.
      • Additionally, if Cluster#shouldPerformContinuousStepdowns() returns true, then two new methods startContinuousFailover() and stopContinuousFailover() should be added to the Cluster object in order to be able to suspend the stepdown threads and restart them between FSM workload groups. Their implementations should call the corresponding methods on ShardingTest.

      The runWorkloadGroup() function in runner.js should be updated as follows to conditionally start the stepdown threads:

      Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      diff --git a/jstests/concurrency/fsm_libs/runner.js b/jstests/concurrency/fsm_libs/runner.js
      index ec629245da..3cab3ca4d4 100644
      --- a/jstests/concurrency/fsm_libs/runner.js
      +++ b/jstests/concurrency/fsm_libs/runner.js
      @@ -532,18 +532,30 @@ var runner = (function() {
                       cleanup.push(workload);
                   });
      
      +            if (cluster.shouldPerformContinuousStepdowns()) {
      +                cluster.startContinuousFailover();
      +            }
      +
                   try {
      -                // Start this set of foreground workload threads.
      -                threadMgr.spawnAll(cluster, executionOptions);
      -                // Allow 20% of foreground threads to fail. This allows the workloads to run on
      -                // underpowered test hosts.
      -                threadMgr.checkFailed(0.2);
      +                try {
      +                    // Start this set of foreground workload threads.
      +                    threadMgr.spawnAll(cluster, executionOptions);
      +                    // Allow 20% of foreground threads to fail. This allows the workloads to run on
      +                    // underpowered test hosts.
      +                    threadMgr.checkFailed(0.2);
      +                } finally {
      +                    // Threads must be joined before destruction, so do this
      +                    // even in the presence of exceptions.
      +                    errors.push(...threadMgr.joinAll().map(
      +                        e => new WorkloadFailure(
      +                            e.err, e.stack, e.tid, 'Foreground ' + e.workloads.join(' '))));
      +                }
                   } finally {
      -                // Threads must be joined before destruction, so do this
      -                // even in the presence of exceptions.
      -                errors.push(...threadMgr.joinAll().map(
      -                    e => new WorkloadFailure(
      -                        e.err, e.stack, e.tid, 'Foreground ' + e.workloads.join(' '))));
      +                if (cluster.shouldPerformContinuousStepdowns()) {
      +                    // Suspend the stepdown threads prior to calling cleanupWorkload() to avoid
      +                    // causing a failover to happen while the data consistency checks are running.
      +                    cluster.stopContinuousFailover();
      +                }
                   }
               } finally {
                   // Call each foreground workload's teardown function. After all teardowns have completed
      

      Note: While the changes should be tested locally and in an Evergreen patch build with something a patch similar to the following, the actual work to add new resmoke.py YAML suites and Evergreen tasks for running under this configuration will happen under a follow-up SERVER ticket.

      Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      diff --git a/jstests/concurrency/fsm_all_sharded_replication.js b/jstests/concurrency/fsm_all_sharded_replication.js
      index 66de8c45ff..3304f01b49 100644
      --- a/jstests/concurrency/fsm_all_sharded_replication.js
      +++ b/jstests/concurrency/fsm_all_sharded_replication.js
      @@ -96,7 +96,11 @@ var blacklist = [
           return dir + '/' + file;
       });
      
      -runWorkloadsSerially(ls(dir).filter(function(file) {
      -    return !Array.contains(blacklist, file);
      -}),
      -                     {sharded: {enabled: true}, replication: {enabled: true}});
      +runWorkloadsSerially(
      +    ls(dir).filter(function(file) {
      +        return !Array.contains(blacklist, file);
      +    }),
      +    {
      +      sharded: {enabled: true, stepdownOptions: {configStepdown: true, shardStepdown: false}},
      +      replication: {enabled: true}
      +    });
      

            Assignee:
            jack.mulrow@mongodb.com Jack Mulrow
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: