Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Unknown
Fix Version/s: None
Component/s: Backpressure, Retryability
Labels:
None

Epic Link:
Client Backpressure Improvements
Driver Changes:
Needed
Downstream Changes Summary:
Hide

Summary of necessary driver changes

Commits for syncing spec/prose tests
(and/or refer to an existing language POC if needed)

Context for other referenced/linked tickets
Show
Summary of necessary driver changes Commits for syncing spec/prose tests (and/or refer to an existing language POC if needed) Context for other referenced/linked tickets

Summary

We need to extend client backpressure so that the token-bucket retry budget is tracked per server (individual mongod/mongos instance) rather than only per client, allowing retries to be throttled based on the health and overload state of each server independently.

Motivation

Who is the affected end user?

Any client leveraging a MongoDB Driver.

How does this affect the end user?

Without per-server token buckets, overload on one server can exhaust a shared retry budget and reduce retries against other servers, leading to unnecessarily high error rates, lower throughput, and worse latency. Users aren’t completely blocked, but see degraded and harder-to-predict performance, especially when some servers are healthy and others are not.

How likely is it that this problem or use case will occur?

Frequency is harder to determine as we need to collect statistics on this via retry telemetry statistics. It is most relevant for multi-node deployments (replica sets, sharded clusters) under uneven or bursty load, where some servers are frequently overloaded while others remain able to serve traffic.

If the problem does occur, what are the consequences and how severe are they?

Primarily a performance and efficiency concern:

Elevated overload error rates from servers that could otherwise be avoided.
Reduced goodput and worse P95/P99 latency compared to what we could achieve with per-server retry budgeting.
In more severe overload, this can exacerbate degradation but is not by itself a hard outage.

Is this issue urgent?

Moderately urgent as a refinement to existing backpressure behavior: it improves how drivers behave under heterogeneous server load but does not block the current backpressure/IWM rollout. No specific date is mandated; priority is Major / Critical depending on IWM roadmap and perf findings.

Is this ticket required by a downstream team?

Yes, it supports downstream Workload Resilience / IWM and performance goals by giving a more accurate mapping between server overload and client retries, which benefits Atlas and any product depending on stable overload behavior, even if not tied to a single named consumer.

Is this ticket only for tests?

No. While it will require new perf/validation workloads, this is a functional behavioral change (introducing per-server token-bucket semantics in drivers), not just additional testing.

is blocked by

DRIVERS-3464 Implement server-side handling for retry metadata sent from drivers

Backlog

Assignee:: Unassigned
Reporter:: Jib Adegunloye
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Apr 29 2026 03:38:01 PM UTC
Updated:: May 04 2026 03:37:07 PM UTC

Details

Description

Summary

Motivation

Who is the affected end user?

How does this affect the end user?

How likely is it that this problem or use case will occur?

If the problem does occur, what are the consequences and how severe are they?

Is this issue urgent?

Is this ticket required by a downstream team?

Is this ticket only for tests?

Attachments

Issue Links

Activity

People

Dates