[SERVER-31670] Change replica set fixture used by replica_sets_jscore_passthrough to make its secondary have zero votes Created: 20/Oct/17  Updated: 30/Oct/23  Resolved: 09/Jan/18

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.2.19, 3.4.11, 3.6.3, 3.7.1

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Kevin Albertson
Resolution: Fixed Votes: 0
Labels: tig-resmoke
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-32691 Create passthrough for w="majority" w... Closed
related to SERVER-32774 Ensure change_streams_secondary_reads... Closed
related to SERVER-32468 Use a 1-node CSRS in non-stepdown sha... Closed
related to SERVER-32572 Run causally consistent resmoke suite... Closed
related to SERVER-44214 Give replica set secondaries votes in... Closed
is related to SERVER-32688 FSM replication suites should give se... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v3.6, v3.4, v3.2
Sprint: TIG 2018-1-15
Participants:
Linked BF Score: 0

 Description   

To prevent unexpected failovers due to machine slowness. We could apply this to any passthrough suites that run against a replica set and expect a stable topology for the duration of the test.

This changes the suites meeting the criteria:

  1. uses a ReplicaSetFixture or ShardedClusterFixture (which uses a ReplicaSetFixture)
  2. has more than one node in the replica set
  3. does not specify voting_secondaries: true
  4. does not specify all_nodes_electable: true

Namely, these suites:

  • aggregation_read_concern_majority_passthrough
  • integration_tests_replset
  • jstestfuzz_replication
  • jstestfuzz_replication_initsync
  • jstestfuzz_replication_resync
  • jstestfuzz_replication_session
  • read_concern_linearizable_passthrough
  • read_concern_majority_passthrough
  • replica_sets_initsync_jscore_passthrough
  • replica_sets_initsync_static_jscore_passthrough
  • replica_sets_jscore_passthrough
  • replica_sets_resync_static_jscore_passthrough
  • retryable_writes_jscore_passthrough
  • change_streams_secondary_reads
  • jstestfuzz_sharded_continuous_stepdown


 Comments   
Comment by Githook User [ 19/Jan/18 ]

Author:

{'name': 'Kevin Albertson', 'email': 'kevin.albertson@10gen.com', 'username': 'kevinAlbs'}

Message: SERVER-31670 fixture secondary defaults to 0 votes

(cherry picked from commit af6bd1d39b42cc3d99c5854c6e280df31e858442)

Also includes the following related fix:
SERVER-32774 ensure change_streams_secondary_reads has voting
secondaries

(cherry picked from commit 60b8011a2f76d14283076a94eac4a9dbec5838bc)
Branch: v3.6
https://github.com/mongodb/mongo/commit/af13b835dfc57b60a00987e993a6eec86a3e1653

Comment by Spencer Brody (Inactive) [ 13/Jan/18 ]

kevin.albertson, can you update this ticket's summary/description to make clear what set of suites were actually changed?

Comment by William Schultz (Inactive) [ 12/Jan/18 ]

max.hirschhorn kevin.albertson Would it be reasonable to make a similar change to the FSM replication suites?

Comment by Githook User [ 11/Jan/18 ]

Author:

{'email': 'kevin.albertson@10gen.com', 'name': 'Kevin Albertson', 'username': 'kevinAlbs'}

Message: SERVER-31670 fixture secondary defaults to 0 votes

(cherry picked from commit af6bd1d39b42cc3d99c5854c6e280df31e858442)
Branch: v3.2
https://github.com/mongodb/mongo/commit/6bc007607ba290a1ebee316a912ea074132f8e98

Comment by Githook User [ 08/Jan/18 ]

Author:

{'name': 'Kevin Albertson', 'username': 'kevinAlbs', 'email': 'kevin.albertson@10gen.com'}

Message: SERVER-31670 fixture secondary defaults to 0 votes
Branch: master
https://github.com/mongodb/mongo/commit/af6bd1d39b42cc3d99c5854c6e280df31e858442

Comment by Max Hirschhorn [ 27/Dec/17 ]

Since config servers should be able to handle failover in passthroughs. It might be good to exclude them from this.

judah.schvimer, while I agree during the test execution itself mongos ought to transparently handle any network errors due to an election, there are a few cases where resmoke.py will have the mongo shell connect directly to the config server replica set. One such case is for performing the data consistency checks. We've previously seen cases where the primary immediately steps down following the "fsyncUnlock" command after the dbhash check finishes (i.e. once stepdown can acquire the global X lock) that I'm inclined to change votes=0 for all replica sets—including the config server replica set—as a way to reduce the number of failures due to an unexpected election.

Per our Slack conversation, I also have a possibly more outlandish idea of having passthrough suites that aren't stepping down the CSRS primary use a 1-node replica set as the CSRS, but we'll save that for SERVER-32468.

Comment by Judah Schvimer [ 27/Dec/17 ]

Since config servers should be able to handle failover in passthroughs. It might be good to exclude them from this.

Comment by Max Hirschhorn [ 26/Dec/17 ]

To implement this behavior in resmoke.py, we should be able to just change the default value of voting_secondaries to false when all_nodes_electable=false.

if i > 0:
    if not self.all_nodes_electable:
        member_info["priority"] = 0
    if i >= 7 or not self.voting_secondaries:
        # Only 7 nodes in a replica set can vote, so the other members must still be
        # non-voting when this fixture is configured to have voting secondaries.
        member_info["votes"] = 0

https://github.com/mongodb/mongo/blob/43b1a2984a4c7b9d1fbc9e4d0d6596c6f626ffda/buildscripts/resmokelib/testing/fixtures/replicaset.py#L101-L107

Generated at Thu Feb 08 04:27:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.