-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Networking & Observability
-
None
-
3
-
TBD
-
None
-
None
-
None
-
None
-
None
-
None
-
None
mongos utilizes the maxTimeMS provided by the request to set a deadline for the processing of that request, including any egress networking it might do as part of that. If the deadline is hit while mongos is performing networking, the egress work will be canceled immediately, often resulting in the connection involved needing to be closed. This cancellation is implemented in two different ways:
- implicitly by interrupting the thread that is running the baton (e.g. in the ARS)
- This results in "Baton wait canceled" errors.
- Explicitly in NetworkInterfaceTL by cancelling the networking after a timer fires.
During periods of high load when timeouts may be more common, this churning of connections can induce a feedback loop that leads to more timeouts, more churn, more work for the reactor, and ultimately unavailability. We should update mongos' egress networking to enforce deadlines and timeouts without sacrificing egress connections to reduce this risk of unavailability and make the connection pool a more reliable way to constrain concurrency on mongos.
- related to
-
SERVER-90756 Prefer returning remote MaxTimeMSExpired error over local timeout in NetworkInterfaceTL
-
- Backlog
-
-
DRIVERS-2884 CSOT avoid connection churn when operations timeout
-
- In Review
-