Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Replication
Labels:
None

Assigned Teams:

Replication
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Allow arbiters to fully participate in consensus with an oplog (like view stamped replication). --ArbiterWithOplog=<size in mb of oplog>

This would mean that w:majority including arbiters works as expected; No rollbacks assuming no one falls off the oplog. Doing so would prevent rollbacks with network flapping and the use of an arbiter.

This would also mean that a primary may not be elected with a majority up until one of the data bearing nodes has replicated the oplog of the arbiter.

Let's say your replica set configuration calls for N data bearing nodes and M arbiters.

Consider the case where M and N are both positive. If exactly ceil(N/2) data nodes go offline but no arbiters do, you'll still have
a primary, but no w:majority writes can be acknowledged. Therefore, any write into a replica set in such a degraded state is subject to eventual rollback.

Observe that the common, minimally expensive mode of operation for replica sets is the above with N=2 and M=1. If either data node is offline, no writes are rollback-proof (though, barring a second failure, none of them will get rolled back). Further, no write concern stronger than w=1 will be confirmed to the application.

At present, all replica sets with greater than zero arbiters will have such pathological modes of operation around sufficiently many data node failures.

This leaves the operator with 3 choices in a 3 node set with an arbiter (these are the same choices in a larger set just easier to describe with a concrete example):
1)w:1 and accept rollbacks (i.e. silently lose data, it is currently possible to loss an arbitrary amount of data via multiple rollbacks)
2)w:2 and accept the system goes down with a single node
3)Monitor rs.status and dynamically change write concern before every write to try and get the best of both worlds

This would create choice 4:
4)w:2 and know that all committed writes won't be rolled back and that loss of a single node won't take down the set (limited by oplog time but it can be given a week).

For completeness, the 3 choices in a replica set for write concern that can be mapped to larger replicas:
w:1, less than majority.
w:2, majority.
w:3, greater than majority.

is related to

SERVER-26717 PSA flapping during netsplit when using PV1

Closed

SERVER-18453 Avoiding Rollbacks in new Raft based election protocol

Closed

related to

SERVER-20820 Arbiter with instant replay

Backlog

Assignee:: [DO NOT USE] Backlog - Replication Team
Reporter:: Charlie Page (Inactive)
Participants:: [DO NOT USE] Backlog - Replication Team, Charlie Page, Jason R. Coombs
Votes:: 5 Vote for this issue
Watchers:: 34 Start watching this issue

Created:: Jul 12 2014 03:11:16 PM UTC
Updated:: Dec 06 2022 05:04:02 AM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates