Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-1307

Investigate changes in PM-1096: Initial Sync Semantics

    XMLWordPrintableJSON

Details

    • Icon: Epic Epic
    • Resolution: Won't Fix
    • Icon: Major - P3 Major - P3
    • None
    • None
    • None

    Description

      Downstream Change Summary

      Description of Linked Ticket

      Epic Summary

      Summary

      When a node is added to a replica set and goes into initial sync, its addition has an effect on the availability and durability guarantees of the replica set, both while it is in the STARTUP2 state and for some time after it transitions to SECONDARY. Those effects are poorly understood, difficult to reason about, and may not be what people expect. We should modify our behavior such that we no longer break any guarantees as part of initial sync, and generally bring our behavior more in line with user expectations.

      Motivation

      There are two main problems with initial sync semantics currently. One is that when adding a new voting node it becomes possible for writes acknowledged with w:majority to rollback. This can happen both for new writes that the initial syncing node acknowledges in the case when the initial sync then fails, as well as for writes that had previously been acknowledged before the initial syncing node was added to the set (and changes the definition of majority in the process).

      The second problem with initial sync semantics is that since the switch to timestamp-based rollback in 4.0, it is now the case that if a node needs to roll back after completing initial sync but before committing a new operation as SECONDARY, the rollback will fail with an UnrecoverableRollbackError and need a full resync. In earlier versions with rollback via refetch, the rollback would succeed. A full resync can also be required if the node crashes during that same time window. Users may not expect this new behavior, and so it’d be nice to return to the world where once a node has exited the STARTUP2 state and transitioned to SECONDARY that it is stable and able to do anything another secondary in the set will be capable of doing.

      Documentation

      Scope Document
      Design Document

      (ARCHIVED) Scope Document

      Attachments

        Activity

          People

            Unassigned Unassigned
            backlog-server-pm Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: