[SERVER-7156] w:majority issues with votes Created: 25/Sep/12 Updated: 10/Dec/14 Resolved: 19/Aug/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Richard Kreuter (Inactive) | Assignee: | Eric Milkie |
| Resolution: | Done | Votes: | 10 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
If you have a weird distribution of votes in a system, you can have a situation where a write succeeds with w:majority but disappears after a failover. For example:
Suppose w:majority write reaches "A" and "B" but not "C", so the client gets the confirmation; and then "A" and "B" both fail simultaneously. "C" will elect itself primary, but not have the write. If the goal of w:majority is to give people the guarantee that the write will remain despite any failure that nonetheless leaves the replica set with a primary, then w:majority should be vote-aware. Alternatively, we should get rid of votes. EDIT: We will be deprecating member votes that are not 0 or 1. |
| Comments |
| Comment by Eric Milkie [ 19/Aug/13 ] |
|
Correction: we will only deprecate votes > 1 (see linked ticket), and no further work will be done for this ticket. |
| Comment by Eric Milkie [ 14/Aug/13 ] |
|
For 2.6, votes will be deprecated. - |
| Comment by Eric Milkie [ 30/Jul/13 ] |
|
How should this behave with arbiters? The current behavior is somewhat complicated. For a majority, it uses either (half the number of nodes + 1), or (the total number of non-arbiters), whichever is fewer. For the new behavior, we can do: |
| Comment by Spencer Brody (Inactive) [ 06/May/13 ] |
|
This is also a problem if you set votes to zero. Imagine a 5-node replica set with 3 nodes in one DC and 2 in other. If your main DC goes down, one way to make the other DC elect a primary would be to do a forced reconfigure and set votes:0 on the 3 nodes from the main DC, which are all down. That would successfully cause the set to think it has a majority and promote one of the two remaining nodes to primary. If, however, you then do a write with w:majority, it will time out as it will still consider the votes:0 nodes to be part of the majority needed for write acknowledgement. |
| Comment by Christopher Price [ 16/Oct/12 ] |
|
+1 In my use case I have 3 node set: 1 Primary, 1 Visible Secondary and 1 Hidden Secondary. This feature would solve TWO problems for me. Problem #1 = As described above, a replica safe write really isn't safe if it is only written to hidden nodes or nodes that can "never" get elected. Using REPLICAS_SAFE. Problem #2 = Throttling writes. When the replication chain is like this: I estimate that this ticket would solve about 90% of our Mongo-based outages. Perhaps this could be a REPLICAS_SAFE replica set configuration option? Something that ensures that non-electable nodes do not sync directly off of the primary (unless it is the only thing available) and that visible secondaries should never sync off of hidden or non-electable nodes (the chain should always go to a primary or visible secondary with the same or more number of votes). |