Loading...

XML

Word

Printable

JSON

Type: Investigation
Resolution: Declined
Priority: Major - P3
Fix Version/s: No version
Affects Version/s: None
Component/s: None
Labels:
None

Documentation Changes:
Not Needed

Assigned Teams:

Developer Tools

Original Downstream Change Summary

This project adds:

new metrics to the serverStatus command response
extends currentOp command response
new option to aggregate and getMore commands
extends aggregate and getMore command responses (opt-in when the new option is used)
extends query stats
Description of Linked Ticket

Epic Summary

Summary

This project will provide additional metrics for Change Streams offering our customers and support engineers the necessary observability to troubleshoot issues related to change streams, as well as monitor their resource usage and predict possible errors and outages. The current change streams observability is very limited and has been a subject of numerous discussions during support ticket processing and on the #change-streams channel. The necessary metrics will be added to the serverStatus and currentOp command outputs and possibly to the aggregate and getMore command outputs. We’ll also evaluate exposing them through Open Telemetry.

This is a critical correctness and enterprise-readiness initiative required to support high-value internal customers (Atlas Search, Atlas Stream Processing) and external enterprise users, as mentioned in the approved project idea document.

Motivation

Many enterprise customers depend on the Change Streams feature for real-time change event processing. However, the current lack of observability creates significant business risks:

Strategic Alignment: Enabling AI Transformation (Atlas Search): Atlas Search is the cornerstone of our AI strategy (powering Vector Search and RAG applications). Atlas Search relies on change streams to keep the search and vector indexes in sync with MongoDB's data set. To stay competitive, search must ingest data at massive scale. For instance, currently, it is difficult and labor-intensive to investigate why Atlas Search indexing is not performing as expected.
High Support Costs & TTR: When a change stream lags, or disconnects, our technical support team has no immediate way to determine if the issue is a slow client, network latency, or server-side resource contention. This extends Time-To-Resolution (TTR) and wastes engineering hours on deep-dive debugging.
Capacity Planning Guesswork: Customers (like Atlas Search) cannot effectively scale their usage because they cannot measure the resource cost (CPU/IO) of opening additional streams. This limits their adoption of the feature.
Silent Failures: Critical applications (like Atlas Stream Processing) run the risk of 'falling off the oplog' (unable to catch up) without any warning. Currently, there is no metric to alert a user before this catastrophic data loss state occurs.

Please refer to the supporting customer cases in the linked project idea document and the related feature request in Aha!.

Documentation

Project Proposal

Docs Update
Syntax
Technical Design
Scope

Assignee:: Unassigned
Reporter:: Backlog - Core Eng Program Management Team
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Mar 13 2026 04:38:30 PM UTC
Updated:: Mar 31 2026 01:52:52 PM UTC
Resolved:: Mar 31 2026 01:52:20 PM UTC

Details

Description

Description of Linked Ticket

Summary

Motivation

Documentation

Attachments

Activity

People

Dates