-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Storage Execution
-
Storage Execution 2026-06-22
-
(copied to CRM)
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Overview
When an index build is committed — whether on a primary (via the commitIndexBuild oplog write) or a secondary (applying the commitIndexBuild oplog entry) — no forced checkpoint is triggered afterward. The completed index is written to the storage engine via a WriteUnitOfWork, but durability depends solely on the next periodic WiredTiger checkpoint (default: every 60 seconds).
Background
If a node crashes in the window between committing the index build and the next scheduled checkpoint, it must replay and re-execute the entire index build from scratch during startup recovery — even though the commit had already happened and the index was fully built. Index builds are expensive, long-running operations, so this unnecessary re-execution can significantly delay recovery time on large collections. This risk applies to any node (primary or secondary) that commits an index build.
Scope of Work
- src/mongo/db/index_builds/index_builds_coordinator.cpp — force a checkpoint after committing an index build (two-phase path, covers both primary commit and applyCommitIndexBuild() on secondaries)
- src/mongo/db/index_builds/primary_driven/util.cpp — force a checkpoint after commit() (primary-driven path)
- src/mongo/db/repl/oplog.cpp — commitIndexBuild oplog handler registration (reference point for secondary apply path)
Note: While this can benefit all two-phase index builds, the necessity of this for Primary-Driven Index Builds is less due to both the nature of Disaggregated Storage and due to the way resumability will work on Primary-Driven Index Builds.
Acceptance Criteria
- After an index build is committed on any node (primary or secondary), a storage engine checkpoint is forced before returning
- A node that crashes immediately after committing an index build does not re-run the index build during startup recovery
Technical Notes
- Both code paths must be addressed: the two-phase path (IndexBuildsCoordinator) and the primary-driven path (index_builds::primary_driven::commit())
- The checkpoint should be forced after the WriteUnitOfWork commits, ensuring the durable index state is captured