Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-8904

Allow checkpoints taken while backup cursor is open to be used during startup recovery

    • Type: Icon: New Feature New Feature
    • Resolution: Won't Fix
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      Summary
      (Copied from SERVER-58446): Today, checkpoints taken while a WT backup cursor is open are not used for startup recovery. This is because when a backup cursor is opened, WT makes the on-disk file layout look like the restore, for easy copying. Part of this is writing a WiredTiger.backup file which overrides the active WiredTiger.wt file, and hides any new checkpoints that are created after the backup cursor is opened. After the file copies are complete and the backup cursor is closed, WT deletes the WiredTiger.backup file and the new checkpoints are once again available for startup recovery to utilize.

      If the server should undergo an unclean shutdown while the backup cursor is open, the subsequent startup recovery logic will begin with the last checkpoint completed prior to when the backup cursor was opened, and play the writeahead log forward from that point. This can take a very long time, if the backup cursor was open a long time on a busy system prior to the unclean shutdown.

      Motivation

      In SERVER-58446 we worked around the problem by flagging when a backup cursor is open and, if we detect that one was open during recovery, removing the WiredTiger.backup file prior to starting WiredTiger.

      We are bypassing the problem but I feel we are solving it at the wrong layer, if there's a better way to solve it inside WT we could remove the workaround from MongoDB.

      • How likely is it that this use case or problem will occur?
        Nowadays it won't happen (unless we missed some cases with the workaround) but if some internal WT details change it could reoccur.
      • If the problem does occur, what are the consequences and how severe are they?
        Longer recovery time.
      • Is this issue urgent?
        No, there's a workaround in place at a higher layer.

      Acceptance Criteria (Definition of Done)
      Checkpoints taken with an open backup cursor can be used for recovery and we can remove the workaround SERVER-58446 introduced from MongoDB.

      • Testing
        We need to make sure checkpoints taken with an open backup cursor can be used for recovery.
      • Documentation update
        No.

            Assignee:
            sue.loverso@mongodb.com Susan LoVerso
            Reporter:
            daniel.gomezferro@mongodb.com Daniel Gomez Ferro
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: