Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-14401

On recovery, ignore checkpoint log records related to an incomplete checkpoint

    • Type: Icon: Bug Bug
    • Resolution: Won't Do
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Logging
    • Storage Engines, Storage Engines - Foundations
    • StorEng - 2025-04-25

      During the investigation of WT-14376, we've noticed that a checkpoint in the log file is incomplete (a checkpoint start record without a matching checkpoint end). Recovering past this point has caused two classes of errors. in addition to WT-14376 related errors, WT-11669 was also indirectly caused by an incomplete log record.  It was fixed in a different way that kept the current way of log processing.  In addition, WT-11669 is linked to WT-11297, also related to backup, and again caused by a incomplete log record.

      While there may be a fix more directly related to the error that causes recovery to fall apart in WT-14376, we might choose to consider that an sequence of checkpoint log records (between start and end) either be processed in their entirety or not at all. By "checkpoint log records" we specifically mean records that update metadata row(s) that are in service to the checkpoint.  This ticket uses that latter philosophy in finding a fix.

      The sequence of checkpoint log records is usually (always?) just three records: checkpoint_start, a commit record for metadata rows, checkpoint end.  If checkpoint_start is present (and is a reference point, and performs no action) and checkpoint end is not present, the only record in question is the middle one.  So the fix needs to find that record and not perform it.  Any other interleaving records are performed.

      This should be safe because the checkpoint record updates the individual file checkpoint records, and it doesn't include any data for client tables.  Erasing the memory of those individual file checkpoints is recoverable, as any blocks used by the checkpoint are on the free list of the previous checkpoint, which is the one that remains in effect.

            Assignee:
            donald.anderson@mongodb.com Donald Anderson
            Reporter:
            donald.anderson@mongodb.com Donald Anderson
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: