Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-58446

Allow checkpoints taken while backup cursor is open to be used during startup recovery



    • Type: Improvement
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 6.0 Required
    • Component/s: WiredTiger
    • Labels:
    • Case:


      Today, checkpoints taken while a WT backup cursor is open are not used for startup recovery. This is because when a backup cursor is opened, WT makes the on-disk file layout look like the restore, for easy copying. Part of this is writing a WiredTiger.backup file which overrides the active WiredTiger.wt file, and hides any new checkpoints that are created after the backup cursor is opened. After the file copies are complete and the backup cursor is closed, WT deletes the WiredTiger.backup file and the new checkpoints are once again available for startup recovery to utilize.
      If the server should undergo an unclean shutdown while the backup cursor is open, the subsequent startup recovery logic will begin with the last checkpoint completed prior to when the backup cursor was opened, and play the writeahead log forward from that point. This can take a very long time, if the backup cursor was open a long time on a busy system prior to the unclean shutdown.

      To fix this, we could implement the following algorithm:
      1. Just prior to opening the backup cursor, MongoDB logic would write a new file in the dbpath (or rewrite the storage.bson file, perhaps), as a flag that the backup cursor was open.
      2. Just after closing the backup cursor, MongoDB logic would delete this new file or wipe out the flag in storage.bson.
      3. At startup time, prior to calling wiredtiger_open(), MongoDB would detect the flag by looking for the file or looking in storage.bson. If the flag is detected, this indicates that MongoDB had an unclean shutdown with a backup cursor open. MongoDB would thus delete the existing WiredTiger.backup file (if it exists), and then clear the flag. This would allow WiredTiger startup recovery to see the newest checkpoint written prior to the unclean shutdown.




            backlog-server-execution Backlog - Storage Execution Team
            milkie Eric Milkie
            0 Vote for this issue
            15 Start watching this issue