Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30685

Implement stepdown thread in Python as a resmoke.py hook

    XMLWordPrintable

    Details

      Description

      A background thread should be added as a resmoke.py hook (i.e. a class to run dedicated logic before a test starts or after a test finishes) to control whether failovers are being triggered on the resmoke.py fixture. We don't want a stepdown to occur while we are validating data consistency (i.e. during ValidateCollections and CheckReplDBHash), so being able to suspend the stepdown thread is a necessary feature.

      • Create buildscripts/resmokelib/testing/hooks/stepdown.py file.
      • Define a _StepdownThread class as a subclass of threading.Thread with a run() method that behaves roughly as follows:

      if isinstance(self.fixture, fixtures.replicaset.ReplicaSetFixture):
          replica_sets = [self.fixture]
      elif isinstance(self.fixture, fixtures.shardedcluster.ShardedClusterFixture):
          # TODO: Check configuration options to see whether 'config_stepdown' and 'shard_stepdown' are
          # both requested.
          replica_sets = [self.fixture.configsvr]
          if self.fixture.num_rs_nodes_per_shard is not None:
              replica_sets.extend(self.fixture.shards)
       
      while True:
          if self.__should_stop()
              break
       
          for replica_set in replica_sets:
              # TODO: Handle exceptions from getting the primary or from trying to step it down.
              client = utils.new_mongo_client(port=replica_set.get_primary().port)
              client.admin.command(bson.SON([
                  ("replSetStepDown", self.stepdown_duration_secs),
                  ("force", True),
              ])
       
          self.__should_stop.wait(self.stepdown_interval_ms / 1000.0)
       
      self.__terminated.set()
      

      Note: In addition to a "running" and "terminated" state, the StepdownThread could have an explicit "suspended" state where it waits on a condition variable until signaled by resmoke.py's job thread to resume triggering failovers. This would avoid continuously spawning and joining the stepdown thread every time the resmoke.py hook runs.

      • Define a ContinuousStepdown class as a subclass of the hooks.interface.CustomBehavior class with a before_test() method that spawns a _StepdownThread instance (or signals it via a condition variable to resume), and an after_test() method to suspend the thread. The configuration options for the ContinuousStepdown class should match what is mentioned in SERVER-30675.

      config_stepdown: boolean (default to true)
      election_timeout_ms: number (default to 5 seconds)
      shard_stepdown: boolean (defaults to true)
      stepdown_duration_secs: number (defaults to 10 seconds)
      stepdown_interval_ms: number (defaults to 8 seconds)
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              yves.duhem Yves Duhem
              Reporter:
              max.hirschhorn Max Hirschhorn
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: