-
Type: New Feature
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Testing Infrastructure
-
Fully Compatible
-
TIG 2017-09-11
A background thread should be added as a resmoke.py hook (i.e. a class to run dedicated logic before a test starts or after a test finishes) to control whether failovers are being triggered on the resmoke.py fixture. We don't want a stepdown to occur while we are validating data consistency (i.e. during ValidateCollections and CheckReplDBHash), so being able to suspend the stepdown thread is a necessary feature.
- Create buildscripts/resmokelib/testing/hooks/stepdown.py file.
- Define a _StepdownThread class as a subclass of threading.Thread with a run() method that behaves roughly as follows:
if isinstance(self.fixture, fixtures.replicaset.ReplicaSetFixture): replica_sets = [self.fixture] elif isinstance(self.fixture, fixtures.shardedcluster.ShardedClusterFixture): # TODO: Check configuration options to see whether 'config_stepdown' and 'shard_stepdown' are # both requested. replica_sets = [self.fixture.configsvr] if self.fixture.num_rs_nodes_per_shard is not None: replica_sets.extend(self.fixture.shards) while True: if self.__should_stop() break for replica_set in replica_sets: # TODO: Handle exceptions from getting the primary or from trying to step it down. client = utils.new_mongo_client(port=replica_set.get_primary().port) client.admin.command(bson.SON([ ("replSetStepDown", self.stepdown_duration_secs), ("force", True), ]) self.__should_stop.wait(self.stepdown_interval_ms / 1000.0) self.__terminated.set()
Note: In addition to a "running" and "terminated" state, the StepdownThread could have an explicit "suspended" state where it waits on a condition variable until signaled by resmoke.py's job thread to resume triggering failovers. This would avoid continuously spawning and joining the stepdown thread every time the resmoke.py hook runs.
- Define a ContinuousStepdown class as a subclass of the hooks.interface.CustomBehavior class with a before_test() method that spawns a _StepdownThread instance (or signals it via a condition variable to resume), and an after_test() method to suspend the thread. The configuration options for the ContinuousStepdown class should match what is mentioned in
SERVER-30675.
config_stepdown: boolean (default to true) election_timeout_ms: number (default to 5 seconds) shard_stepdown: boolean (defaults to true) stepdown_duration_secs: number (defaults to 10 seconds) stepdown_interval_ms: number (defaults to 8 seconds)
- is depended on by
-
SERVER-31194 Add a version of retryable_writes_jscore_passthrough.yml with stepdowns
- Closed