Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33846

Alternative for setting oplog read timestamp on secondaries

    • Fully Compatible
    • Repl 2018-03-26
    • 0

      Currently, the oplog read timestamp is set via the same asynchronous mechanism, regardless of replication state (PRIMARY or SECONDARY): a thread loop takes note of the latest oplog entry's optime with no holes after it, waits for journal, and then publishes that optime as the new oplog read value.
      The algorithm is correct for primary nodes. However, as an optimization, it does not have to wait for journaling on secondary nodes, because it is never possible to read holes after an unclean shutdown of a secondary node (due to our durable storing of the last applied time). Today, we have a problem with the stable timestamp (and oldest timestamp) racing ahead of the oplog read timestamp on secondaries. By forgoing the wait for journaling on secondaries, we can set the oplog read timestamp in lock step with the stable timestamp and oldest timestamp, thus avoiding the race.

      The work for this ticket will be to change the oplog read timestamp loop to only operate while a node is in primary mode; in secondary mode, new code inserted into the applier loop will set the oplog read timestamp when the last applied time is set.

            daniel.gottlieb@mongodb.com Daniel Gottlieb (Inactive)
            milkie@mongodb.com Eric Milkie
            1 Vote for this issue
            6 Start watching this issue