Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-60509

onReplicationRollback should crash on failure

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major - P3 Major - P3
    • 5.2.0
    • None
    • None
    • None
    • Fully Compatible
    • ALL
    • v5.1, v5.0
    • Repl 2021-10-18, Repl 2021-11-01
    • 40

    Description

      Currently, we trigger rollback op observers after recovering to the stableTS, and resetting the lastApplied/lastDurable optimes. However, it's possible that we fail in the op observers, failing rollback but not resetting the lastFetchedOpTime. The consequence is that we end up retrying rollback but hanging indefinitely since Replication will first wait for the lastApplied to reach the lastFetchedOpTime before starting rollback. In this case, we wait to apply an oplog entry that no longe exists.

      The server will only fail rollback and crash if the error returned is a UnrecoverableRollbackError and retry rollback otherwise. We tend to use rollback op observers to clean up on-disk state (as demonstrated in the linked BF), so if the procedure fails, we should instead crash on rollback failure instead of retrying.

      Attachments

        Activity

          People

            jason.chan@mongodb.com Jason Chan
            jason.chan@mongodb.com Jason Chan
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: