[SERVER-14539] Full consensus arbiter (i.e. uses an oplog) Created: 12/Jul/14 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Charlie Page | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 5 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||
| Description |
|
Allow arbiters to fully participate in consensus with an oplog (like view stamped replication). --ArbiterWithOplog=<size in mb of oplog> This would mean that w:majority including arbiters works as expected; No rollbacks assuming no one falls off the oplog. Doing so would prevent rollbacks with network flapping and the use of an arbiter. This would also mean that a primary may not be elected with a majority up until one of the data bearing nodes has replicated the oplog of the arbiter. Let's say your replica set configuration calls for N data bearing nodes and M arbiters. Consider the case where M and N are both positive. If exactly ceil(N/2) data nodes go offline but no arbiters do, you'll still have Observe that the common, minimally expensive mode of operation for replica sets is the above with N=2 and M=1. If either data node is offline, no writes are rollback-proof (though, barring a second failure, none of them will get rolled back). Further, no write concern stronger than w=1 will be confirmed to the application. At present, all replica sets with greater than zero arbiters will have such pathological modes of operation around sufficiently many data node failures. This leaves the operator with 3 choices in a 3 node set with an arbiter (these are the same choices in a larger set just easier to describe with a concrete example): This would create choice 4: For completeness, the 3 choices in a replica set for write concern that can be mapped to larger replicas: |
| Comments |
| Comment by Jason R. Coombs [ 10/May/16 ] |
|
This feature would also serve another purpose - to allow the creation of nodes with unusually large oplogs for the purposes of initializing new replicas where the current oplog size on members of the set is too small and is overrun before a new member can be synced. Such a feature would save us dozens of hours every year and make our replica sets so much easier to manage. |