Check index build resumed via metric

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Execution
    • Storage Execution 2026-07-06
    • 200
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Summary

      The index_builds.js library sometimes reads log messages to check properties about resumable index builds. This mode of testing is inherently fragile because the log buffer is circular/bounded. If the log buffer limit is exceeded, the relevant log can be overwritten and the test can incorrectly fail.

      We should replace log checks with metric checks in index_builds.js.

      Motivation

      Consider checkResume(), which checks for log ID 4841700 to "ensure that the resume info contains the correct phase to resume from." This log ID was emitted but lost in BF-43501. BF-40886 was also caused by this issue.

      Proposed Solution

      Existing log checks for index build resume state can be replaced with OpenTelemetry metrics. index_builds.resume.started with a phase attribute would address BF-43501.

      This approach is already used in SPM-4469. See PrimaryDrivenResumableIndexBuildTest._readResumeMetrics and its usage. Since checkResume() does not currenty have access to the OTel exporter, this change would require

      1. Creating a metrics directory ({
        Unknown macro: {createMetricsDirectory(jsTestName()}

        }

      2. Passing openTelemetryMetricsDirectory() to the mongod nodes via a server parameter
      3. Snapshotting the relevant metric(s) before the node restart that triggers the resume.

            Assignee:
            Gregory Noma
            Reporter:
            Cedric Sirianni
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: