Extend serverStatus.asserts OTel counter with a `command` attribute

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Networking & Observability
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Overview

      Extend the OTel counter mongodb.serverStatus.asserts (added in SERVER-128460) with an additional command attribute alongside the existing kind attribute, so operators can slice assertion-failure rates by which command was running when the assertion fired.

      Background

      SERVER-128460 implemented approach (b) from a #server-networking-observability design discussion: a single int64 counter with a kind attribute ∈ {regular, msg, user, tripwire}. Approach (c) — adding a command attribute on top — was deferred because:

      • It requires threading the running command name through the assertion path (currently the assertion bump site has no access to the in-flight Command*).
      • The resulting cardinality (kinds × commands) is higher than what we want on by default; this variant needs a runtime knob so it can be opted into by deployments that want the breakdown.

      Today the observer signature is void(AssertionKind) noexcept in
      src/mongo/util/assert_util.h; the four bump sites in src/mongo/util/assert_util.cpp (bumpAssertion) do not carry a command context.

      Scope of Work

      1. Thread command context through the assertion path

      2. Add the command attribute and the opt-in knob

      3. Test coverage

      Open Questions

      • Should the command attribute carry the running command's name (e.g. find, insert) or the external command name (which may differ for aliased commands)? Likely the former; confirm with #server-networking-observability.
      • How is the command name plumbed to the assertion bump sites — a thread-local set by the command dispatcher, or an OperationContext decoration read by bumpAssertion? Decoration is cleaner but assert_util sits below most of the server; a thread-local set/cleared by the command-dispatch RAII helper may be the path of least dependency.
      • Should disabled mode skip the observer dispatch entirely, or always dispatch and let the counter implementation drop the unused attribute? Affects perf vs. simplicity.

      Acceptance Criteria

      • mongodb.serverStatus.asserts carries a command attribute when the new server parameter is enabled; existing kind-only behavior is preserved when disabled.
      • No regression in assertion-path latency when the knob is off (microbenchmark or perf-required patch shows no statistically significant delta).
      • New jstest exercises both modes; existing otel_asserts_metric_file_export.js continues to pass.
      • TODO comment referencing this ticket in asserts_otel_metric.cpp is removed.

            Assignee:
            Unassigned
            Reporter:
            Charlie Swanson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: