[SERVER-30685] Implement stepdown thread in Python as a resmoke.py hook Created: 16/Aug/17  Updated: 30/Oct/23  Resolved: 01/Sep/17

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.5.13

Type: New Feature Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Yves Duhem
Resolution: Fixed Votes: 0
Labels: sharding36-passthrough-testing, tig-resmoke
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-31194 Add a version of retryable_writes_jsc... Closed
Backwards Compatibility: Fully Compatible
Sprint: TIG 2017-09-11
Participants:

 Description   

A background thread should be added as a resmoke.py hook (i.e. a class to run dedicated logic before a test starts or after a test finishes) to control whether failovers are being triggered on the resmoke.py fixture. We don't want a stepdown to occur while we are validating data consistency (i.e. during ValidateCollections and CheckReplDBHash), so being able to suspend the stepdown thread is a necessary feature.

  • Create buildscripts/resmokelib/testing/hooks/stepdown.py file.
  • Define a _StepdownThread class as a subclass of threading.Thread with a run() method that behaves roughly as follows:

if isinstance(self.fixture, fixtures.replicaset.ReplicaSetFixture):
    replica_sets = [self.fixture]
elif isinstance(self.fixture, fixtures.shardedcluster.ShardedClusterFixture):
    # TODO: Check configuration options to see whether 'config_stepdown' and 'shard_stepdown' are
    # both requested.
    replica_sets = [self.fixture.configsvr]
    if self.fixture.num_rs_nodes_per_shard is not None:
        replica_sets.extend(self.fixture.shards)
 
while True:
    if self.__should_stop()
        break
 
    for replica_set in replica_sets:
        # TODO: Handle exceptions from getting the primary or from trying to step it down.
        client = utils.new_mongo_client(port=replica_set.get_primary().port)
        client.admin.command(bson.SON([
            ("replSetStepDown", self.stepdown_duration_secs),
            ("force", True),
        ])
 
    self.__should_stop.wait(self.stepdown_interval_ms / 1000.0)
 
self.__terminated.set()

Note: In addition to a "running" and "terminated" state, the StepdownThread could have an explicit "suspended" state where it waits on a condition variable until signaled by resmoke.py's job thread to resume triggering failovers. This would avoid continuously spawning and joining the stepdown thread every time the resmoke.py hook runs.

  • Define a ContinuousStepdown class as a subclass of the hooks.interface.CustomBehavior class with a before_test() method that spawns a _StepdownThread instance (or signals it via a condition variable to resume), and an after_test() method to suspend the thread. The configuration options for the ContinuousStepdown class should match what is mentioned in SERVER-30675.

config_stepdown: boolean (default to true)
election_timeout_ms: number (default to 5 seconds)
shard_stepdown: boolean (defaults to true)
stepdown_duration_secs: number (defaults to 10 seconds)
stepdown_interval_ms: number (defaults to 8 seconds)



 Comments   
Comment by Githook User [ 01/Sep/17 ]

Author:

{'username': 'syev', 'name': 'Yves Duhem', 'email': 'yves.duhem@mongodb.com'}

Message: SERVER-30685 New continuous stepdown hook
Branch: master
https://github.com/mongodb/mongo/commit/c30149da2ea0af52a1532017550431ac356f04f3

Generated at Thu Feb 08 04:24:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.