Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-55766

Introduce an optimized "for restore" startup replication recovery mechanism

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.2.15, 4.4.7, 5.0.0-rc0
    • Affects Version/s: None
    • Component/s: Replication
    • Labels:
    • Fully Compatible
    • v4.4, v4.2
    • Repl 2021-05-03, Repl 2021-05-17

      After a restore users generally don't need to be able to roll back or do PIT reads earlier than the top of the oplog.

      Replication recovery can also be very long after a restore, and the stable/oldest timestamp cannot advance during replication recovery. This isn't great even with durable history, but can lead to very poor performance in 4.2 before durable history.

      We should provide a startup parameter, that when configured, applies oplog entries either:

      1. without timestamps to create no history, or
      2. with timestamps, but advancing the stable/oldest timestamp between batches
        so that the storage engine can evict history.

      We may have to set the initial data timestamp at the end of recovery to prevent rollbacks or reads before the timestamp at the end of recovery. We also need to consider what happens when the nodes crashes halfway through recovery, and make sure it doesn't corrupt data in that case.

      This should only be supported and used in Atlas.

      Note that if a rollback were necessary to a point before the end of the recovery, the rollback would fail unrecoverably. If the restore was used to seed a new replica set, it is not expected that a node in that set would roll back to a point before the last seeded oplog entry.

      Credit to lingzhi.deng for this idea.

            matthew.russotto@mongodb.com Matthew Russotto
            judah.schvimer@mongodb.com Judah Schvimer
            1 Vote for this issue
            22 Start watching this issue