-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Execution
-
Storage Execution 2026-07-06
-
200
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Summary
The index_builds.js library sometimes reads log messages to check properties about resumable index builds. This mode of testing is inherently fragile because the log buffer is circular/bounded. If the log buffer limit is exceeded, the relevant log can be overwritten and the test can incorrectly fail.
We should replace log checks with metric checks in index_builds.js.
Motivation
Consider checkResume(), which checks for log ID 4841700 to "ensure that the resume info contains the correct phase to resume from." This log ID was emitted but lost in BF-43501. BF-40886 was also caused by this issue.
Proposed Solution
Existing log checks for index build resume state can be replaced with OpenTelemetry metrics. index_builds.resume.started with a phase attribute would address BF-43501.
This approach is already used in SPM-4469. See PrimaryDrivenResumableIndexBuildTest._readResumeMetrics and its usage. Since checkResume() does not currenty have access to the OTel exporter, this change would require
- Creating a metrics directory ({
Unknown macro: {createMetricsDirectory(jsTestName()}
}
- Passing openTelemetryMetricsDirectory() to the mongod nodes via a server parameter
- Snapshotting the relevant metric(s) before the node restart that triggers the resume.