Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33806

Oldest timestamp can move ahead of the commit point.

    • Fully Compatible
    • ALL
    • Repl 2018-03-26
    • 0

      The `oldest timestamp` is the time to which the storage engine maintains history. It can service all reads with read_timestamp >= oldest_timestamp. The `commit point`/`committed snapshot` in replication is the timestamp which a majority of voting nodes have durably replicated. To service majority read (reads of data that cannot be rolled back) replication advances the commit point then updates the `stable_timestamp`. It's a subtle detail that updating the stable timestamp, internally updates the oldest timestamp to the same value.

      However, there are conditions where ReplicationCoordinatorImpl::updateCommittedSnapshot_inLock does not, in fact move the commit point forward. This inaction is not captured in the return value and the calling function unconditionally follows by setting the stable timestamp. This leaves the server in a state where a majority read would fail — the server is no longer keeping enough history to satisfy a read at the commit point.

      Notably, the `disableSnapshotting` failpoint can cause a consumer test, read_committed_on_secondary.js to fail.

      It's unclear if the contract of `setStableTimestamp` should explicitly state the value may not be set ahead of the commit point. Or, whether the storage engine should consider exposing to steady state replication a way to advance the oldest timestamp where this relationship must instead be enforced.

            daniel.gottlieb@mongodb.com Daniel Gottlieb (Inactive)
            daniel.gottlieb@mongodb.com Daniel Gottlieb (Inactive)
            0 Vote for this issue
            8 Start watching this issue