Loading...

XML

Word

Printable

JSON

Type: Epic
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Component/s: Performance, Retryability, Server Selection
Labels:
None

Epic Name:
Direct read/write retries to another mongos if possible
Documentation Changes:
Not Needed
Downstream Changes Summary:
Hide

Drivers should implement server selection and read/write retry mechanisms changes, as well as new prose tests: specifications@86d961f

2024-02-21: Drivers that have not yet completed this ticket should reference f5bb605 (DRIVERS-2828) for updated prose test specification.
Show
Drivers should implement server selection and read/write retry mechanisms changes, as well as new prose tests: specifications@86d961f 2024-02-21: Drivers that have not yet completed this ticket should reference f5bb605 ( DRIVERS-2828 ) for updated prose test specification.
Epic Status:
To Do

Quarter:
- FY24Q1
- FY24Q2
- FY24Q3
- FY25Q1
Scope Cost Estimate:
0
Cost to Date:
0
Final Cost Estimate:
0
Cost Threshold %:
100
Confidence Status:
None
Latest Project Update:
None
Detailed Project Statuses:
Hide

Engineer: Dmitry Rybakov
Summary: When encountering a retryable error, direct the retry attempt to a different mongos if possible.

2023-06-23

Design approved

Ready for implementation in Q3 with Go

2023-06-09

Design approved

Changes will be ported to spec repo

2023-05-12

Design work started

Decided against adapting unified test format to accommodate special test needs for this project due to implementation complexity involved
Show
Engineer: Dmitry Rybakov Summary: When encountering a retryable error, direct the retry attempt to a different mongos if possible. 2023-06-23 Design approved Ready for implementation in Q3 with Go 2023-06-09 Design approved Changes will be ported to spec repo 2023-05-12 Design work started Decided against adapting unified test format to accommodate special test needs for this project due to implementation complexity involved

Server Compat:
- 4.4
- 5.0
- 5.3
Driver Changes:
Not Needed
Upstream Changes Summary:

Hide

Details TBD

Show
Details TBD

Driver Compliance:

$i18n.getText("admin.common.words.hide")

Key	Status/Resolution	FixVersion
CDRIVER-4099	Fixed	1.26.0
CXX-2320	Works as Designed
CSHARP-3757	Done	2.26.0
GODRIVER-2101	Fixed	1.13.0, 1.13.1
JAVA-4254	Done	5.2.0
NODE-3470	Fixed	6.4.0
MOTOR-792	Duplicate
PYTHON-2834	Fixed	4.7
PHPLIB-1459	Fixed	1.20.0
RUBY-2748	Fixed	2.20.0
RUST-935	Fixed	2.8.0
SWIFT-1279	Won't Do

$i18n.getText("admin.common.words.show")

#scriptField, #scriptField *{ border: 1px solid black; } #scriptField{ border-collapse: collapse; } #scriptField td { text-align: center; /* Center-align text in table cells */ } #scriptField td.key { text-align: left; /* Left-align text in the Key column */ } #scriptField a { text-decoration: none; /* Remove underlines from links */ border: none; /* Remove border from links */ } /* Add green background color to cells with FixVersion */ #scriptField td.hasFixVersion { background-color: #00FF00; /* Green color code */ } #scriptField td.willNotDo { background-color: #FF0000; /* Red color code */ } /* Center-align the first row headers */ #scriptField th { text-align: center; } Key Status/Resolution FixVersion CDRIVER-4099 Fixed 1.26.0 CXX-2320 Works as Designed CSHARP-3757 Done 2.26.0 GODRIVER-2101 Fixed 1.13.0, 1.13.1 JAVA-4254 Done 5.2.0 NODE-3470 Fixed 6.4.0 MOTOR-792 Duplicate PYTHON-2834 Fixed 4.7 PHPLIB-1459 Fixed 1.20.0 RUBY-2748 Fixed 2.20.0 RUST-935 Fixed 2.8.0 SWIFT-1279 Won't Do

Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None
Goal Tier(s):
None

There are several scenarios in which it would be useful to redirect reads or writes to a different mongos.

A MongoDB sharded cluster deployment may find itself in a situation when a mongos reports itself as being healthy but is unable to execute any queries. The driver has attempted to retry the failing queries, but in a number of cases selected the same mongos that failed in the first place which caused the retry to also fail (for the same reason as the original attempt) and be propagated to the application.
Currently when the driver is in sharded topology, server selection spec requires a random server to be selected for each operation. This permits the same failed mongos to be selected for both an operation and its retry, with the result that the query fails, even when there are healthy mongoses in the deployment that could have successfully executed the query.

The suggested improvement is for the driver, when in sharded cluster topology, to:

Track whether a server selection request is for the first attempt or for a retry,
Track the server used for the first attempt,
When selecting the server for the retry, if there are multiple eligible mongoses, select randomly from mongoses other than the one used for the first attempt.
bonus nice to have: determine if a mongos is healthy before making said attempt and if unhealthy, exclude from selection

Cast of Characters:
Product Manager for Feature: alex.bevilacqua@mongodb.com
Program Manager: tom.selander@mongodb.com
Engineering Lead: dmitry.rybakov@mongodb.com

causes

DRIVERS-2901 Clarify the intent behind the list of deprioritized mongos'es and fix the pseudocode

Needs Triage

depends on

SERVER-53287 Improve cluster/mongos health observability

Closed

is related to

DRIVERS-1842 Drivers should retry authentication errors when connection handshake fails

Backlog

DRIVERS-2140 Clarify Auth Spec and Clean Up Error Section

Backlog

related to

SERVER-50459 Include "source" field in error responses from mongos

Backlog

DRIVERS-2828 Update prose tests for mongos deprioritization during retryable ops

Implementing

split to

PHPLIB-1459 Direct read/write retries to another mongos if possible

Closed

CDRIVER-4099 Direct read/write retries to another mongos if possible

Closed

CSHARP-3757 Direct read/write retries to another mongos if possible

Closed

CXX-2320 Direct read/write retries to another mongos if possible

Closed

GODRIVER-2101 Direct read/write retries to another mongos if possible

Closed

JAVA-4254 Direct retries to another mongos if one is available

Closed

MOTOR-792 Direct read/write retries to another mongos if possible

Closed

NODE-3470 Direct read/write retries to another mongos if possible

Closed

PYTHON-2834 Direct read/write retries to another mongos if possible

Closed

RUBY-2748 Direct read/write retries to another mongos if possible

Closed

RUST-935 Direct read/write retries to another mongos if possible

Closed

(1 related to, 11 split to)

Assignee:: Dmitry Rybakov
Reporter:: Oleg Pudeyev (Inactive)
Engineering Lead:: Jeffrey Yemin
Program Manager:: Tom Selander
Product Manager:: Alex Bevilacqua
Goal DRI(s):: None
Votes:: 6 Vote for this issue
Watchers:: 31 Start watching this issue

Created:: Feb 19 2021 01:44:34 PM UTC
Updated:: Jul 19 2024 02:45:37 AM UTC
Resolved:: Jul 09 2024 04:45:35 PM UTC
Target start:: None
Target end:: None
Start date:: 08/May/23
End date:: None
Confidence Status Last Update:: None
Goal Completion Date:: None

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates