Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- index-builds

Assigned Teams:

Storage Execution
Sprint:
Storage Execution 2025-12-22, Storage Execution 2026-01-05, Storage Execution 2026-01-19, Storage Execution 2026-02-02, Storage Execution 2026-02-16, Storage Execution 2026-03-02
Case:
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

If a node crashes after committing an index build, but before that write makes it into a checkpoint, startup recovery will require a full index rebuild before the node is available again. This can be catastrophic if a node crashes immediately after committing an index build.

We should evaluate a targeted, backportable solution for avoiding this worst-case recovery scenario. For example, could we persist resume info after finishing the index build and before voting for commit?

One caveat is that this may still be subject to the limitations of resumable index builds:

Index builds can only be resumed once
Index builds with non-default commitQuorum cannot be resumed

is related to

SERVER-124042 Index build commit quorum may be met while a majority of nodes are still building

Backlog

related to

SERVER-115247 Index builds should be resumable more than once

Backlog

SERVER-114363 Capture index build blocking replication metrics

Closed

Assignee:: Unassigned
Reporter:: Louis Williams
Participants:: Githook User, Louis Williams, TPM Jira Automations Bot
Votes:: 0 Vote for this issue
Watchers:: 21 Start watching this issue

Due:: 28/Dec/25
Created:: Oct 10 2025 03:15:20 PM UTC
Updated:: May 27 2026 08:00:05 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates

PagerDuty