[SERVER-42217] Remove unnecessary usage of votes:0 in sharding tests Created: 12/Jul/19  Updated: 29/Jul/19  Resolved: 29/Jul/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alexander Taskov (Inactive) Assignee: Esha Maharishi (Inactive)
Resolution: Won't Fix Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Operating System: ALL
Sprint: Sharding 2019-08-12
Participants:
Linked BF Score: 21

 Description   

https://github.com/mongodb/mongo/blob/master/jstests/sharding/safe_secondary_reads_single_migration_waitForDelete.js#L318

These other tests also do this:

$ gr "votes: 0" jstests/sharding/
jstests/sharding/secondary_shard_version_protocol_with_causal_consistency.js:13:    let rsOpts = {nodes: [{rsConfig: {votes: 1}}, {rsConfig: {priority: 0, votes: 0}}]};
jstests/sharding/safe_secondary_reads_single_migration_suspend_range_deletion.js:348:    let rsOpts = {nodes: [{rsConfig: {votes: 1}}, {rsConfig: {priority: 0, votes: 0}}]};
jstests/sharding/balance_repl.js:17:                    {rsConfig: {priority: 0, votes: 0}},
jstests/sharding/balance_repl.js:23:                    {rsConfig: {priority: 0, votes: 0}},
jstests/sharding/unique_index_on_shardservers.js:22:    let newNode = rs.add({'shardsvr': '', rsConfig: {priority: 0, votes: 0}});
jstests/sharding/nonreplicated_uuids_on_shardservers.js:14:    let newNode = rs.add({'shardsvr': '', rsConfig: {priority: 0, votes: 0}});
jstests/sharding/sharding_rs2.js:24:                    {rsConfig: {priority: 0, votes: 0}},
jstests/sharding/sharding_rs2.js:30:                    {rsConfig: {priority: 0, votes: 0}},
jstests/sharding/safe_secondary_reads_drop_recreate.js:535:    let rsOpts = {nodes: [{rsConfig: {votes: 1}}, {rsConfig: {priority: 0, votes: 0}}]};
jstests/sharding/safe_secondary_reads_single_migration_waitForDelete.js:318:    let rsOpts = {nodes: [{rsConfig: {votes: 1}}, {rsConfig: {priority: 0, votes: 0}}]}; 



 Comments   
Comment by Esha Maharishi (Inactive) [ 29/Jul/19 ]

Based on the decision made in SERVER-32691 to prefer setting votes to 0 in tests that require a stable primary, I'm closing this as Won't Fix.

Instead, we can prioritize SERVER-34597 to ensure writes done as part of cluster setup are fully replicated before the test begins, and we can ensure tests that make assertions about replication to secondaries use causal consistency.

Comment by Esha Maharishi (Inactive) [ 29/Jul/19 ]

max.hirschhorn, I see. Is there something we can do to prevent the primary from stepping down while still requiring secondaries to confirm majority writes?

I haven't seen a BF where one of the tests in the description failed due to a write after cluster setup, but since SERVER-34597 only addresses writes done during cluster setup, I didn't see it as a complete solution.

It seems odd that tests must choose between having a stable primary and having secondaries confirm majority writes. It makes writing tests much harder. I mainly wanted to do this ticket to prevent engineers from writing new tests that set votes to 0 but make assertions about the state of secondaries, since such tests would be flaky.

Comment by Max Hirschhorn [ 29/Jul/19 ]

Max Hirschhorn, this ticket will still rig elections by having only one electable (non-priority: 0) node in each shard.

We've found that setting priority=0 and votes=1 may still lead to a primary choosing to step down if it cannot heartbeat the other nodes quickly enough.

Could you please clarify why you're wanting to make these changes to the jstests/sharding/ tests instead of updating ShardingTest to use the getShardVersion command (or whatever the mechanism is going to be to wait for the sharding state to have been initialized) as described in SERVER-34597?

Comment by Esha Maharishi (Inactive) [ 29/Jul/19 ]

max.hirschhorn, this ticket will still rig elections by having only one electable (non-priority: 0) node in each shard.

You are right that a majority write in general may not be visible on all secondaries, but the tests in question only have two-node replica sets (see example from the linked BF).

Comment by Max Hirschhorn [ 14/Jul/19 ]

We have historically had issues with spurious elections when secondaries are configured to have a vote (see SERVER-31670, SERVER-30642, SERVER-32468, and SERVER-32688). My personal view is that tests which aren't specifically about testing the election protocol, majority commit point propagation, etc. should force stable topologies by rigging elections.

I'd also like to point out if we were running a 3-node replica set shard and all members had a vote, a majority-committed write wouldn't be guaranteed to be visible because we could always read from the lagged secondary. My understanding of SERVER-34597 is that ShardingTest in the mongo shell and ShardedClusterFixture in resmoke.py would explicitly wait for all members of all shards to be ready.

Generated at Thu Feb 08 04:59:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.