[SERVER-34299] Require nodes with slaveDelay to have votes:0 Created: 04/Apr/18  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: former-quick-wins
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Replication
Participants:

 Description   

Voting slaveDelayed nodes still contribute to the w:majority confirmation number, and thus can slow down w:majority write confirmation. We should probably require slave delayed nodes to be non-voting.

On upgrade, we'll require users to explicitly specify votes:0, due to the case where the slaveDelayed node could be acting as a weird arbiter.



 Comments   
Comment by Andy Schwerin [ 22/Jun/20 ]

What if 4.6 binaries abort when they load a config with delayed secondaries that vote, and refuse to accept configs via the gossip protocol with them? Then, we just document it, and people will learn on upgrade if they failed to read the docs? It would show up in the logs of processes that aborted.

Comment by Eric Milkie [ 18/Jun/20 ]

The replica set config is not necessarily known at startup; it happens at config load time, which might be gossiped to the node long after it has started up.
I don't think (1) is a viable option because you can't replace a config without running a reconfig command on the primary, and there isn't even a primary guaranteed to be around. A force reconfig might be dangerous.
I think 4 is the best option here.

Comment by A. Jesse Jiryu Davis [ 18/Jun/20 ]

Question about upgrade/downgrade: If a 4.4 replica set has a slaveDelay node with votes > 0, and we upgrade the set to 4.6, what do we do with the slaveDelay node's "votes" value in the config?

  1. On startup with binary version 4.6, replace the config with a new one where the slaveDelay node's votes = 0?
  2. Leave it nonzero but behave as if it's 0?
  3. Leave it nonzero, behave like 4.4, and warn the user they should set it to 0?
  4. Leave it nonzero, behave like 4.4, and when the user calls setFeatureCompatibilityVersion("4.6"), fail and tell the user they must set it to 0?
Comment by Eric Milkie [ 01/Aug/19 ]

Since voting slaveDelay nodes may affect the behavior of flow control, I think we should implement this ticket soon.

Comment by Spencer Brody (Inactive) [ 05/Apr/18 ]

Correct

Comment by Andy Schwerin [ 05/Apr/18 ]

I'm surprised we didn't cover this in the past. Let me guess: they have to have priority 0, but we still let them vote?

Generated at Thu Feb 08 04:36:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.