Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30685

Implement stepdown thread in Python as a resmoke.py hook

    • Fully Compatible
    • TIG 2017-09-11

      A background thread should be added as a resmoke.py hook (i.e. a class to run dedicated logic before a test starts or after a test finishes) to control whether failovers are being triggered on the resmoke.py fixture. We don't want a stepdown to occur while we are validating data consistency (i.e. during ValidateCollections and CheckReplDBHash), so being able to suspend the stepdown thread is a necessary feature.

      • Create buildscripts/resmokelib/testing/hooks/stepdown.py file.
      • Define a _StepdownThread class as a subclass of threading.Thread with a run() method that behaves roughly as follows:
      if isinstance(self.fixture, fixtures.replicaset.ReplicaSetFixture):
          replica_sets = [self.fixture]
      elif isinstance(self.fixture, fixtures.shardedcluster.ShardedClusterFixture):
          # TODO: Check configuration options to see whether 'config_stepdown' and 'shard_stepdown' are
          # both requested.
          replica_sets = [self.fixture.configsvr]
          if self.fixture.num_rs_nodes_per_shard is not None:
              replica_sets.extend(self.fixture.shards)
      
      while True:
          if self.__should_stop()
              break
      
          for replica_set in replica_sets:
              # TODO: Handle exceptions from getting the primary or from trying to step it down.
              client = utils.new_mongo_client(port=replica_set.get_primary().port)
              client.admin.command(bson.SON([
                  ("replSetStepDown", self.stepdown_duration_secs),
                  ("force", True),
              ])
      
          self.__should_stop.wait(self.stepdown_interval_ms / 1000.0)
      
      self.__terminated.set()
      

      Note: In addition to a "running" and "terminated" state, the StepdownThread could have an explicit "suspended" state where it waits on a condition variable until signaled by resmoke.py's job thread to resume triggering failovers. This would avoid continuously spawning and joining the stepdown thread every time the resmoke.py hook runs.

      • Define a ContinuousStepdown class as a subclass of the hooks.interface.CustomBehavior class with a before_test() method that spawns a _StepdownThread instance (or signals it via a condition variable to resume), and an after_test() method to suspend the thread. The configuration options for the ContinuousStepdown class should match what is mentioned in SERVER-30675.
      config_stepdown: boolean (default to true)
      election_timeout_ms: number (default to 5 seconds)
      shard_stepdown: boolean (defaults to true)
      stepdown_duration_secs: number (defaults to 10 seconds)
      stepdown_interval_ms: number (defaults to 8 seconds)
      

            Assignee:
            yves.duhem Yves Duhem
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: