Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- index-builds

Assigned Teams:

Storage Execution
Sprint:
Storage Execution 2025-12-22, Storage Execution 2026-01-05, Storage Execution 2026-01-19
Case:
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

An annoying limitation of resumable index builds is that they are only resumable once.

However, I think it is feasible with a trick that was added in 8.0 here to support independent oplog writing and application. Essentially, rather than waiting for the top of oplog to be majority-comitted (which we do on the primary in steady-state), during startup recovery or secondary steady-state we wait for the startIndexBuild oplog entry to be majority committed. This case is trivially true during startup recovery because we're replaying writes that are majority-acknowledged.

~~The restrictions here about not being resumable in replication recovery are no longer restrictions, and deleting them should "just work" with additional testing to verify the theory.~~

This idea doesn't work. An index build is only "resumable" when 1) it reads only data that cannot be rolled-back (i.e. majority-committed) and 2) it also persist its state during clean shutdown. The majority commit point is unavailable during replication recovery, and the stable checkpoint can be arbitrarily lagged, so we don't know what timestamp is safe to read from to guarantee we only read majority-committed data and make the index re-resumable. We can't just wait until the majority point is available, because a commit oplog entry during recovery would get stuck.

An alternative idea:

After resuming the index build, if the majority commit-point isn't available yet (i.e. startup recovery hasn't completed), we will read at the stable timestamp. This is safe because draining before commit is best-effort to avoid a long critical section during commit. In the commit critical section, we revert back to reading at lastApplied for correctness, to make sure we apply everything (this is already true).
If the commit is received after startup recovery, then the majority point will have already been established, and we will do a final drain reading at majority before committing (this would operates as it does today).
If the commit is received during startup recovery, we will need to apply any side writes from the stable timestamp to the commit timestamp. This will still block startup, but is still better than a full index rebuild. When paired with SERVER-112315 (persist the resume info after checkpointing and before voting for commit), this means we should have very little work to do this phase.

is related to

SERVER-112315 Avoid full index rebuild during startup when crashing after commit

Backlog

Assignee:: Unassigned
Reporter:: Louis Williams
Participants:: Louis Williams
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Dec 10 2025 07:57:23 PM UTC
Updated:: Jul 22 2026 03:36:45 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates

PagerDuty