Take a checkpoint after committing an index build

XMLWordPrintableJSON

    • Storage Execution
    • Storage Execution 2026-06-22
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Overview

      When an index build is committed — whether on a primary (via the commitIndexBuild oplog write) or a secondary (applying the commitIndexBuild oplog entry) — no forced checkpoint is triggered afterward. The completed index is written to the storage engine via a WriteUnitOfWork, but durability depends solely on the next periodic WiredTiger checkpoint (default: every 60 seconds).

      Background

      If a node crashes in the window between committing the index build and the next scheduled checkpoint, it must replay and re-execute the entire index build from scratch during startup recovery — even though the commit had already happened and the index was fully built. Index builds are expensive, long-running operations, so this unnecessary re-execution can significantly delay recovery time on large collections. This risk applies to any node (primary or secondary) that commits an index build.

      Scope of Work

      • src/mongo/db/index_builds/index_builds_coordinator.cpp — force a checkpoint after committing an index build (two-phase path, covers both primary commit and applyCommitIndexBuild() on secondaries)
      • src/mongo/db/index_builds/primary_driven/util.cpp — force a checkpoint after commit() (primary-driven path)
      • src/mongo/db/repl/oplog.cppcommitIndexBuild oplog handler registration (reference point for secondary apply path)

      Note: While this can benefit all two-phase index builds, the necessity of this for Primary-Driven Index Builds is less due to both the nature of Disaggregated Storage and due to the way resumability will work on Primary-Driven Index Builds.

      Acceptance Criteria

      • After an index build is committed on any node (primary or secondary), a storage engine checkpoint is forced before returning
      • A node that crashes immediately after committing an index build does not re-run the index build during startup recovery

      Technical Notes

      • Both code paths must be addressed: the two-phase path (IndexBuildsCoordinator) and the primary-driven path (index_builds::primary_driven::commit())
      • The checkpoint should be forced after the WriteUnitOfWork commits, ensuring the durable index state is captured

            Assignee:
            Alex Sarkesian
            Reporter:
            Alex Sarkesian
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: