Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-14539

Full consensus arbiter (i.e. uses an oplog)

    • Type: Icon: New Feature New Feature
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • None
    • Replication

      Allow arbiters to fully participate in consensus with an oplog (like view stamped replication). --ArbiterWithOplog=<size in mb of oplog>

      This would mean that w:majority including arbiters works as expected; No rollbacks assuming no one falls off the oplog. Doing so would prevent rollbacks with network flapping and the use of an arbiter.

      This would also mean that a primary may not be elected with a majority up until one of the data bearing nodes has replicated the oplog of the arbiter.

      Let's say your replica set configuration calls for N data bearing nodes and M arbiters.

      Consider the case where M and N are both positive. If exactly ceil(N/2) data nodes go offline but no arbiters do, you'll still have
      a primary, but no w:majority writes can be acknowledged. Therefore, any write into a replica set in such a degraded state is subject to eventual rollback.

      Observe that the common, minimally expensive mode of operation for replica sets is the above with N=2 and M=1. If either data node is offline, no writes are rollback-proof (though, barring a second failure, none of them will get rolled back). Further, no write concern stronger than w=1 will be confirmed to the application.

      At present, all replica sets with greater than zero arbiters will have such pathological modes of operation around sufficiently many data node failures.

      This leaves the operator with 3 choices in a 3 node set with an arbiter (these are the same choices in a larger set just easier to describe with a concrete example):
      1)w:1 and accept rollbacks (i.e. silently lose data, it is currently possible to loss an arbitrary amount of data via multiple rollbacks)
      2)w:2 and accept the system goes down with a single node
      3)Monitor rs.status and dynamically change write concern before every write to try and get the best of both worlds

      This would create choice 4:
      4)w:2 and know that all committed writes won't be rolled back and that loss of a single node won't take down the set (limited by oplog time but it can be given a week).

      For completeness, the 3 choices in a replica set for write concern that can be mapped to larger replicas:
      w:1, less than majority.
      w:2, majority.
      w:3, greater than majority.

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            charlie.page@10gen.com Charlie Page
            Votes:
            5 Vote for this issue
            Watchers:
            34 Start watching this issue

              Created:
              Updated: