[SERVER-7156] w:majority issues with votes Created: 25/Sep/12  Updated: 10/Dec/14  Resolved: 19/Aug/13

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Richard Kreuter (Inactive) Assignee: Eric Milkie
Resolution: Done Votes: 10
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-7681 Report majority number in ReplSetGetS... Closed
is related to SERVER-3110 unlimited nonvoting nodes in replica set Closed
is related to SERVER-10513 deprecate votes in replica set config Closed
is related to SERVER-5218 Add a write concern to wait for journ... Closed
Operating System: ALL
Participants:

 Description   

If you have a weird distribution of votes in a system, you can have a situation where a write succeeds with w:majority but disappears after a failover. For example:

{ _id : "myrs",
  members : [ { _id : "0", host : "A", votes: 1 },
              { _id : "1", host : "B", votes: 1 },
              { _id : "2", host : "C", votes: 3 } ] }

Suppose w:majority write reaches "A" and "B" but not "C", so the client gets the confirmation; and then "A" and "B" both fail simultaneously. "C" will elect itself primary, but not have the write.

If the goal of w:majority is to give people the guarantee that the write will remain despite any failure that nonetheless leaves the replica set with a primary, then w:majority should be vote-aware.

Alternatively, we should get rid of votes.

EDIT: We will be deprecating member votes that are not 0 or 1.



 Comments   
Comment by Eric Milkie [ 19/Aug/13 ]

Correction: we will only deprecate votes > 1 (see linked ticket), and no further work will be done for this ticket.

Comment by Eric Milkie [ 14/Aug/13 ]

For 2.6, votes will be deprecated. -For this ticket, w:majority will treat every node with votes > 1 as votes = 1 for that node.-

Comment by Eric Milkie [ 30/Jul/13 ]

How should this behave with arbiters?

The current behavior is somewhat complicated. For a majority, it uses either (half the number of nodes + 1), or (the total number of non-arbiters), whichever is fewer.

For the new behavior, we can do:
half the total number of votes from electable members + 1. An electable member is a member with priority > 0 and is not an arbiter. Note that all members will be treated as having either 0 or 1 vote.

Comment by Spencer Brody (Inactive) [ 06/May/13 ]

This is also a problem if you set votes to zero. Imagine a 5-node replica set with 3 nodes in one DC and 2 in other. If your main DC goes down, one way to make the other DC elect a primary would be to do a forced reconfigure and set votes:0 on the 3 nodes from the main DC, which are all down. That would successfully cause the set to think it has a majority and promote one of the two remaining nodes to primary. If, however, you then do a write with w:majority, it will time out as it will still consider the votes:0 nodes to be part of the majority needed for write acknowledgement.

Comment by Christopher Price [ 16/Oct/12 ]

+1 In my use case I have 3 node set: 1 Primary, 1 Visible Secondary and 1 Hidden Secondary. This feature would solve TWO problems for me.

Problem #1 = As described above, a replica safe write really isn't safe if it is only written to hidden nodes or nodes that can "never" get elected. Using REPLICAS_SAFE.

Problem #2 = Throttling writes. When the replication chain is like this:
Visible Secondary syncs from Hidden Secondary
Hidden Secondary syncs from Primary
Then there is a throttling problem. Our application occasionally has massive spikes of writes and long running reads. During this time, our writes (with w=2) get applied to the Primary and Hidden nodes and the application immediately processes the next write. But because the visible secondary is serving some long running reads, there is a lot of locking/unlocking/yielding going on which has several times lead to replication lag on the visible secondary. This also sometimes leads to the visible secondary becoming completely unresponsive.

I estimate that this ticket would solve about 90% of our Mongo-based outages.

Perhaps this could be a REPLICAS_SAFE replica set configuration option? Something that ensures that non-electable nodes do not sync directly off of the primary (unless it is the only thing available) and that visible secondaries should never sync off of hidden or non-electable nodes (the chain should always go to a primary or visible secondary with the same or more number of votes).

Generated at Thu Feb 08 03:13:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.