We want to capture metrics related to the availability impact of replicated index builds, specifically the problem described by SERVER-112315.
We should expose two serverStatus metrics:
- The last amount of spent between receiving a commitIndexBuild oplog entry and an index build being committed. This should be captured in any replication state (either secondary or startup recovery).
- The last amount of time spent between voting for commit (as self) and actually committing.
This will tell us how long a node was blocking replication by building indexes.
- is related to
-
SERVER-112315 Avoid full index rebuild during startup when crashing after commit
-
- In Code Review
-