Insufficient error context in WT_PANIC checkpoint failures in __wt_block_checkpoint_resolve

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Checkpoints
    • None

      Issue Summary

      When a checkpoint failure occurs in __wt_block_checkpoint_resolve, the system triggers WT_PANIC with a hardcoded error code EINVAL. This results in logs containing only the generic message: "the checkpoint failed, the system must restart." This lack of detailed error context makes it difficult to diagnose the underlying cause of the failure.

      Context

      • The checkpoint failure path currently does not log the actual error or context, only the generic EINVAL code.
      • This was observed in a HELP ticket, where troubleshooting was hindered by insufficient log details.
      • Discussion suggests that for cases like WT_CKPT_INPROGRESS, errors are recoverable and handled, but for WT_CKPT_PANIC_ON_FAILURE, knowing the actual ret value and more context would be valuable.

      Proposed Solution

      • Enhance logging in __wt_block_checkpoint_resolve to include:
      • The actual error code returned (ret value)
      • Additional context about where and how the checkpoint failed
      • Ensure that when WT_PANIC is triggered, the logs provide actionable information for debugging.

      Original Slack thread
      This ticket was generated by AI from a Slack thread.

            Assignee:
            Jasmine Bi
            Reporter:
            Memento Slack Bot
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: