2025-11-21 - 🟢 On Track
Engineer(s): Shane Harvey, Steve Silvester, Iris Ho, Bailey Pearson
What was accomplished since the last update?
- DRIVERS-1934 withTransaction spec 1st and 2nd implementation drafts complete, only waiting on transaction spec owner review
- Node draft implementation for connection rate limiter (DRIVERS-3218) is complete, spec awaiting answer to one open question
- Began work to understand the behavior of drivers when only a single mongos is overloaded (PERF-6964)
- Have an initial draft and python POC for backoff and jitter changes (DRIVERS-3239)
What's the focus over the next two weeks?
Any risks/blockers/impediments?
- If the outcome of PERF-6964 indicates that DRIVERS-3217 is mandatory, it will add additional scope to the project (2-4 weeks per driver). In trying to implement this workload, we hit an unexpected issue because DSI’s set_server_parameter is not enabling the server parameters on replica set secondaries, which required additional work in DSI to be resolved.
2025-11-21 - 🟢 On Track
Engineer(s): Shane Harvey, Steve Silvester, Iris Ho, Bailey Pearson
What was accomplished since the last update?
- DRIVERS-1934 withTransaction spec 1st and 2nd implementation drafts complete, only waiting on transaction spec owner review
- Node draft implementation for connection rate limiter (DRIVERS-3218) is complete, spec awaiting answer to one open question
- Began work to understand the behavior of drivers when only a single mongos is overloaded (PERF-6964)
What's the focus over the next two weeks?
Any risks/blockers/impediments?
- If the outcome of PERF-6964 indicates that DRIVERS-3217 is mandatory, it will add additional scope to the project (2-4 weeks per driver). In trying to implement this workload, we hit an unexpected issue because DSI’s set_server_parameter is not enabling the server parameters on replica set secondaries, which required additional work in DSI to be resolved.
2025-11-06 - 🟢 On Track
Engineer(s): Shane Harvey, Steve Silvester, Iris Ho, Bailey Pearson
What was accomplished since the last update?
- Draft implementation for Node (2nd implementation for spec) complete on DRIVERS-1934, pending design finalization
- Started Node draft implementation for connection rate limiter (DRIVERS-3218)
- Token bucket in sustained overload workload (PERF-7397) completed
- Result: ~90% reduction in average latency during sustained overload and similar reductions in 95th and 99th latency percentiles at the expense of 2.8x increase in error rate
- Reviewed Decision for Connection Rate Limiting (WRITING-34116). The plan agreed on was to rollout the connection rate limiter with a conservative 20 second long queue (ingressConnectionEstablishmentBurstCapacitySecs).
- This conservative limit is set to avoid rejecting connections that the server has a good chance of accepting within the driver's connect timeout because older drivers will clear the connection pool and induce a meta stable failure. Once customers upgrade to backpressure enabled drivers, we can decrease this queuing time.
- Gained insight into individual language challenges arising from the originally proposed design and revised the design proposal to mitigate these:
- We benchmarked the new Pool backoff state separately from the "don't clear the connection pool" change and found no difference in the workload. We decided we can safely cut the pool backoff state to reduce the scope. This will simplify the implementation estimates for drivers.
- We also decided to drop the requirement to interrupt pending connections to 1) reduce the scope and 2) avoid extra churn on connection creation attempts that might succeed.
What's the focus over the next two weeks?
- Finalize design and estimates/delivery timelines for the project
- Finalize spec changes (including 2nd implementation) for DRIVERS-1934
Any risks/blockers/impediments?
2025-10-24 - 🟢 On Track
Engineer(s): Shane Harvey, Steve Silvester, Iris Ho
What was accomplished since the last update?
- Operation burst workload to demonstrate benefit of client backpressure (PERF-7190)
- Results: Without any retries the workload encounters ~8000 overload errors. With 3 max retries the workload encounters ~500 errors. With 5 max retries the workload encounters ~5 errors.
- Main learning: 3 retries is not sufficient. We need to increase the limit to 5 or more.
- Moved the Design for Client Backpressure into review (WRITING-32696).
- Spec changes for withTransaction backpressure is in draft
- Token bucket workload in review (PERF-7397)
What's the focus over the next two weeks?
- Finalize design and estimates for the project
- Finalize all perf workloads (PERF-7190, PERF-7397, PERF-6964 - only one mongos is overloaded)
- Finish drafting connection pool spec changes (DRIVERS-3218)
Any risks/blockers/impediments?
- Estimates for individual driver implementations depend on design finalization
- Perf results for token bucket are inconclusive
2025-10-16 - 🟢 On Track
2025-10-16:
- What was completed over the last two weeks?
- Completed availability workload to verify our improvements to connection rate limiter error handling in progress (PERF-7078). This confirms our expected perf/availability improvements:
Latency95thPercentile improves from 8979ms to 93.59ms
OperationThroughput improves from 2444q/s to 6134q/s.
- Perf workload for to verify client backpressure retry policy in review (PERF-7190)
- What’s the focus over the next two weeks?
- Complete remaining performance workloads PERF-7190 and PERF-6964.
- Complete review Design document (WRITING-32696).
- Draft spec changes for connection pool and withTransaction projects.
2025-10-03:
- What was completed over the last two weeks?
- Scope document completed (WRITING-32695).
- Merged availability reproducer for withTransaction induced write conflict storm (PERF-7188)
- Started investigation into default values for withTransaction backoff parameters and thier affect of latency (
PYTHON-5562). Discovered we may want to use the backoff algorithm that the server uses for write conflict retry developed in SERVER-88000. The main difference is that the backoff grows more gradually.
- Availability workload to verify our improvements to connection rate limiter error handling in progress (PERF-7078)
- What’s the focus over the next two weeks?
- Design document in review (WRITING-32696).
- Begin spec changes for connection pool and withTransaction.
2025-09-29 - 🟢 On Track
2025-08-19:
- What was completed over the last two weeks?
- Scope document is in review (WRITING-32695).
- Python POC work has begun (
PYTHON-5504 and PYTHON-5505).
- What’s the focus over the next two weeks?
- Put Design document in review (WRITING-32696).
- Complete Python POC for adaptive retry loop (
PYTHON-5505 and PYTHON-5506).
- Demo improved write conflict storm behavior (
PYTHON-5504).
2025-08-19:
- What was completed over the last two weeks?
- What’s the focus over the next two weeks?
- Put Design document in review (WRITING-32696).
- Complete Python POC for adaptive retry loop (
PYTHON-5505 and PYTHON-5506).
- Demo improved write conflict storm behavior (
PYTHON-5504).