Type: New Feature
Priority: Major - P3
Affects Version/s: None
(copied to CRM)
Allow arbiters to fully participate in consensus with an oplog (like view stamped replication). --ArbiterWithOplog=<size in mb of oplog>
This would mean that w:majority including arbiters works as expected; No rollbacks assuming no one falls off the oplog. Doing so would prevent rollbacks with network flapping and the use of an arbiter.
This would also mean that a primary may not be elected with a majority up until one of the data bearing nodes has replicated the oplog of the arbiter.
Let's say your replica set configuration calls for N data bearing nodes and M arbiters.
Consider the case where M and N are both positive. If exactly ceil(N/2) data nodes go offline but no arbiters do, you'll still have
a primary, but no w:majority writes can be acknowledged. Therefore, any write into a replica set in such a degraded state is subject to eventual rollback.
Observe that the common, minimally expensive mode of operation for replica sets is the above with N=2 and M=1. If either data node is offline, no writes are rollback-proof (though, barring a second failure, none of them will get rolled back). Further, no write concern stronger than w=1 will be confirmed to the application.
At present, all replica sets with greater than zero arbiters will have such pathological modes of operation around sufficiently many data node failures.
This leaves the operator with 3 choices in a 3 node set with an arbiter (these are the same choices in a larger set just easier to describe with a concrete example):
1)w:1 and accept rollbacks (i.e. silently lose data, it is currently possible to loss an arbitrary amount of data via multiple rollbacks)
2)w:2 and accept the system goes down with a single node
3)Monitor rs.status and dynamically change write concern before every write to try and get the best of both worlds
This would create choice 4:
4)w:2 and know that all committed writes won't be rolled back and that loss of a single node won't take down the set (limited by oplog time but it can be given a week).
For completeness, the 3 choices in a replica set for write concern that can be mapped to larger replicas:
w:1, less than majority.
w:3, greater than majority.