Convert FTDC metrics to OpenTelemetry

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Fixed
    • Priority: Major - P3
    • 9.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Execution
    • Fully Compatible
    • Storage Execution 2026-03-30
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      replicated_fast_count_manager.cpp contains numerous server status metrics:

      // Boolean flag indicating whether or not the fast count background thread is currently running.
      Atomic<bool> isRunning

      {false}

      ;

      // Flushes persist fast count information to the oplog and occur during checkpointing,
      // shutdown, step down, etc. The total number of flush attempts = flushSuccessCount +
      // flushFailureCount.
      Atomic<int64_t> flushSuccessCount

      {0};
      Atomic<int64_t> flushFailureCount{0}

      ;
      Atomic<int64_t> flushTimeMsMin

      {std::numeric_limits<int64_t>::max()}

      ;
      Atomic<int64_t> flushTimeMsMax

      {0};
      Atomic<int64_t> flushTimeMsTotal{0}

      ;
      // Aggregate metrics for the min/max number of documents inserted or updated during one flush.
      Atomic<int64_t> flushedDocsMin

      {std::numeric_limits<int>::max()}

      ;
      Atomic<int64_t> flushedDocsMax

      {0};
      // The total number of documents written during flushes. Used to compute the average flush size.
      Atomic<int64_t> flushedDocsTotal{0}

      ;

      // The number of times an empty diff is found when writing an update to the replicated fast
      // count collection.
      Atomic<int64_t> emptyUpdateCount

      {0};

      // The number of inserts into a new record for storing size and count data.
      Atomic<int64_t> insertCount{0}

      ;
      // The number of update to an existing record storing size and count data.
      Atomic<int64_t> updateCount

      {0};
      // The total time spent writing metadata to the replicated fast count collection.
      // writeTimeMsTotal / flushTimeMsTotal = the proportion of iteration time writing dirty
      // metadata.
      Atomic<int64_t> writeTimeMsTotal{0}

      ;

      For each metric, use the corresponding mongo::otel::metrics metric.

      In generateSection(), there are conditions for reporting the values of min and max. There is also a
      recordFlush() member function. Preserve these implementation details by authoring a class or struct
      that wraps the otel metrics and provides an API for modifying them. Define this class in
      replicated_fast_count_metrics.

      {h, cpp}

      .

      Write unit tests in replicated_fast_count_metrics_test.cpp using the explanation provided in
      otel/metrics/README.md. Evaluate whether or not the replicated_fast_count_server_status.js still
      provides helpful test coverage.

      As part of this change, check if there is an OTel ObservableMutex API for the _metadataMutex.

      Retain all documentation and metric call sites.

            Assignee:
            Cedric Sirianni
            Reporter:
            Cedric Sirianni
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: