Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-60509

onReplicationRollback should crash on failure

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.2.0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v5.1, v5.0
    • Repl 2021-10-18, Repl 2021-11-01
    • 40

      Currently, we trigger rollback op observers after recovering to the stableTS, and resetting the lastApplied/lastDurable optimes. However, it's possible that we fail in the op observers, failing rollback but not resetting the lastFetchedOpTime. The consequence is that we end up retrying rollback but hanging indefinitely since Replication will first wait for the lastApplied to reach the lastFetchedOpTime before starting rollback. In this case, we wait to apply an oplog entry that no longe exists.

      The server will only fail rollback and crash if the error returned is a UnrecoverableRollbackError and retry rollback otherwise. We tend to use rollback op observers to clean up on-disk state (as demonstrated in the linked BF), so if the procedure fails, we should instead crash on rollback failure instead of retrying.

            Assignee:
            jason.chan@mongodb.com Jason Chan
            Reporter:
            jason.chan@mongodb.com Jason Chan
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: