Primary-driven index build interrupted by stepdown is incorrectly counted as failed in index_builds.failed metric

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 9.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Execution
    • Fully Compatible
    • ALL
    • Storage Execution 2026-06-08
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Overview

      On a primary-driven index build (PDIB), when the primary steps down the in-memory build on that node is torn down so the node can finish the index as a secondary (the new primary resumes/commits the build). This teardown currently increments the index_builds.failed OTEL counter, even though the build did not fail – it was handed off and will be resumed.

      Background

      The build is unregistered via ActiveIndexBuilds::unregisterIndexBuild(..., IndexBuildOutcome::kFailure) in the PDIB stepdown cleanup paths. The new primary separately records index_builds.resume.succeeded when it resumes the build, so the old primary recording a failure double-counts and makes normal failover look like an error.

      Scope of Work

      • src/mongo/db/index_builds/active_index_builds.h – add a neutral IndexBuildOutcome::kToBeResumed outcome.
      • src/mongo/db/index_builds/active_index_builds.cpprecordIndexBuildOutcome treats kToBeResumed as a no-op (neither succeeded nor failed; the active gauge is still decremented).
      • src/mongo/db/index_builds/index_builds_coordinator.cpp – route the three PDIB stepdown teardown sites (LOGV2 12741700 / 12741701 / 12741702) through kToBeResumed.

      Acceptance Criteria

      • A PDIB interrupted by stepdown does not increment index_builds.failed on the stepped-down node.
      • Genuine aborts, setup failures, and shutdown still record index_builds.failed.
      • jstests/noPassthrough/index_builds/index_stepdown_failover.js under PDIB expects the stepped-down primary to record the build as neither succeeded nor failed.

      Technical Notes

      • The pre-init stepdown path (index_stepdown_before_init.js) fails during _setUpIndexBuild before completeSetup() and still records a failure – out of scope and unchanged.

            Assignee:
            Gregory Noma
            Reporter:
            Gregory Noma
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: