-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Networking & Observability
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Overview
Extend the OTel counter mongodb.serverStatus.asserts (added in SERVER-128460) with an additional command attribute alongside the existing kind attribute, so operators can slice assertion-failure rates by which command was running when the assertion fired.
Background
SERVER-128460 implemented approach (b) from a #server-networking-observability design discussion: a single int64 counter with a kind attribute ∈ {regular, msg, user, tripwire}. Approach (c) — adding a command attribute on top — was deferred because:
- It requires threading the running command name through the assertion path (currently the assertion bump site has no access to the in-flight Command*).
- The resulting cardinality (kinds × commands) is higher than what we want on by default; this variant needs a runtime knob so it can be opted into by deployments that want the breakdown.
Today the observer signature is void(AssertionKind) noexcept in
src/mongo/util/assert_util.h; the four bump sites in src/mongo/util/assert_util.cpp (bumpAssertion) do not carry a command context.
Scope of Work
1. Thread command context through the assertion path
2. Add the command attribute and the opt-in knob
3. Test coverage
Open Questions
- Should the command attribute carry the running command's name (e.g. find, insert) or the external command name (which may differ for aliased commands)? Likely the former; confirm with #server-networking-observability.
- How is the command name plumbed to the assertion bump sites — a thread-local set by the command dispatcher, or an OperationContext decoration read by bumpAssertion? Decoration is cleaner but assert_util sits below most of the server; a thread-local set/cleared by the command-dispatch RAII helper may be the path of least dependency.
- Should disabled mode skip the observer dispatch entirely, or always dispatch and let the counter implementation drop the unused attribute? Affects perf vs. simplicity.
Acceptance Criteria
- mongodb.serverStatus.asserts carries a command attribute when the new server parameter is enabled; existing kind-only behavior is preserved when disabled.
- No regression in assertion-path latency when the knob is off (microbenchmark or perf-required patch shows no statistically significant delta).
- New jstest exercises both modes; existing otel_asserts_metric_file_export.js continues to pass.
- TODO comment referencing this ticket in asserts_otel_metric.cpp is removed.
- is related to
-
SERVER-94409 Add ErrorCode counters to FTDC and serverStatus
-
- Open
-
- related to
-
SERVER-128460 Add OTel metrics for failed commands.
-
- Closed
-