-
Type: Epic
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Component/s: Performance, Retryability, Server Selection
-
None
There are several scenarios in which it would be useful to redirect reads or writes to a different mongos.
- A MongoDB sharded cluster deployment may find itself in a situation when a mongos reports itself as being healthy but is unable to execute any queries. The driver has attempted to retry the failing queries, but in a number of cases selected the same mongos that failed in the first place which caused the retry to also fail (for the same reason as the original attempt) and be propagated to the application.
- Currently when the driver is in sharded topology, server selection spec requires a random server to be selected for each operation. This permits the same failed mongos to be selected for both an operation and its retry, with the result that the query fails, even when there are healthy mongoses in the deployment that could have successfully executed the query.
The suggested improvement is for the driver, when in sharded cluster topology, to:
- Track whether a server selection request is for the first attempt or for a retry,
- Track the server used for the first attempt,
- When selecting the server for the retry, if there are multiple eligible mongoses, select randomly from mongoses other than the one used for the first attempt.
- bonus nice to have: determine if a mongos is healthy before making said attempt and if unhealthy, exclude from selection
Cast of Characters:
Product Manager for Feature: alex.bevilacqua@mongodb.com
Program Manager: tom.selander@mongodb.com
Engineering Lead: dmitry.rybakov@mongodb.com
- causes
-
DRIVERS-2901 Clarify the intent behind the list of deprioritized mongos'es and fix the pseudocode
- Needs Triage
- depends on
-
SERVER-53287 Improve cluster/mongos health observability
- Closed
- is related to
-
DRIVERS-1842 Drivers should retry authentication errors when connection handshake fails
- Backlog
-
DRIVERS-2140 Clarify Auth Spec and Clean Up Error Section
- Backlog
- related to
-
SERVER-50459 Include "source" field in error responses from mongos
- Backlog
-
DRIVERS-2828 Update prose tests for mongos deprioritization during retryable ops
- Implementing
- split to
-
PHPLIB-1459 Direct read/write retries to another mongos if possible
- Closed
-
CDRIVER-4099 Direct read/write retries to another mongos if possible
- Closed
-
CSHARP-3757 Direct read/write retries to another mongos if possible
- Closed
-
CXX-2320 Direct read/write retries to another mongos if possible
- Closed
-
GODRIVER-2101 Direct read/write retries to another mongos if possible
- Closed
-
JAVA-4254 Direct retries to another mongos if one is available
- Closed
-
MOTOR-792 Direct read/write retries to another mongos if possible
- Closed
-
NODE-3470 Direct read/write retries to another mongos if possible
- Closed
-
PYTHON-2834 Direct read/write retries to another mongos if possible
- Closed
-
RUBY-2748 Direct read/write retries to another mongos if possible
- Closed
-
RUST-935 Direct read/write retries to another mongos if possible
- Closed