Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33846

Alternative for setting oplog read timestamp on secondaries

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Sprint:
      Repl 2018-03-26
    • Linked BF Score:
      0

      Description

      Currently, the oplog read timestamp is set via the same asynchronous mechanism, regardless of replication state (PRIMARY or SECONDARY): a thread loop takes note of the latest oplog entry's optime with no holes after it, waits for journal, and then publishes that optime as the new oplog read value.
      The algorithm is correct for primary nodes. However, as an optimization, it does not have to wait for journaling on secondary nodes, because it is never possible to read holes after an unclean shutdown of a secondary node (due to our durable storing of the last applied time). Today, we have a problem with the stable timestamp (and oldest timestamp) racing ahead of the oplog read timestamp on secondaries. By forgoing the wait for journaling on secondaries, we can set the oplog read timestamp in lock step with the stable timestamp and oldest timestamp, thus avoiding the race.

      The work for this ticket will be to change the oplog read timestamp loop to only operate while a node is in primary mode; in secondary mode, new code inserted into the applier loop will set the oplog read timestamp when the last applied time is set.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              daniel.gottlieb Daniel Gottlieb
              Reporter:
              milkie Eric Milkie
              Participants:
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: