[DRIVERS-2170] Errors on retryable ops should indicate originating server when possible Created: 08/Oct/19  Updated: 11/Jan/24

Status: Implementing
Project: Drivers
Component/s: Retryability
Fix Version/s: None

Type: Spec Change Priority: Major - P3
Reporter: Jeremy Mikola Assignee: Jamis Buck
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Issue split
split to CDRIVER-4753 Errors on retryable ops should indica... Backlog
split to CXX-2776 Errors on retryable ops should indica... Backlog
split to JAVA-5225 Errors on retryable ops should indica... Closed
split to MOTOR-1201 Errors on retryable ops should indica... Closed
split to NODE-5719 Errors on retryable ops should indica... Closed
split to PYTHON-4017 Errors on retryable ops should indica... Closed
split to RUBY-3340 Errors on retryable ops should indica... Closed
split to RUST-1787 Errors on retryable ops should indica... Closed
split to CSHARP-4826 Errors on retryable ops should indica... Scheduled
split to GODRIVER-3031 Errors on retryable ops should indica... Scheduled
split to PHPLIB-1296 Errors on retryable ops should indica... Scheduled
Related
related to RUBY-2004 Indicated attempt not always correct Closed
related to PYTHON-4137 Return originating server on all serv... Backlog
related to RUBY-1744 When handshake/auth fails, indicate w... Closed
related to RUBY-1905 Indicate which server operations were... Closed
related to RUBY-1954 Integration test for exceptions refer... Closed
Driver Changes: Needed
Quarter: FY24Q4
Downstream Changes Summary:

Summary of necessary driver changes

This only affects drivers that include server information (like the server name or address) in exceptions and error messages.

Any driver that includes server information in an exception or error message must ensure that the exception or message refers to the server that originated the error. For example, during a retry of a failed read or write, the original server must not simply be carried over to a subsequent error.

Engineering Lead: Andreas Braun Andreas Braun
Start date:
Driver Compliance:
Key Status/Resolution FixVersion
CDRIVER-4753 Backlog
CXX-2776 Backlog
CSHARP-4826 Scheduled
GODRIVER-3031 Scheduled
JAVA-5225 Works as Designed
NODE-5719 Works as Designed
MOTOR-1201 Duplicate
PYTHON-4017 Duplicate
PHPLIB-1296 Scheduled
RUBY-3340 Works as Designed
RUST-1787 Works as Designed

 Description   

While discussing some of the error reporting improvements that oleg.pudeyev has added to the Ruby driver, I learned that Ruby goes out of its way to report the specific server that produced an error ultimately reported when a retryable operation totally fails. This overcomes possible ambiguity where an operation can report an error message corresponding to the first or second attempt. Executing Retryable Write Commands states:

If an error would not allow the caller to infer that an attempt was made (e.g. connection pool exception originating from the driver), the original error should be raised. If the retry failed due to another retryable error or some other error originating from the server, that error should be raised instead as the caller can infer that an attempt was made and the second error is likely more relevant (with respect to the current topology state).

While Command Monitoring can also be used to infer the error's originating server, we can't rely on that being enabled in production systems where applications might only be logging an exception.

The Retryable Reads spec does not appear to have any section discussing error reporting (possible oversight), but it should also be able to implement this.

If a driver already attaches a server description (or equivalent) to its exceptions for server-side errors, I think they could implement this ticket just by ensuring that the attached server is always the originating server of the error being reported. For example, a driver that always attaches the first attempt's server to such an exception should be changed to conditionally attach the first or second accordingly.



 Comments   
Comment by Githook User [ 09/Jan/24 ]

Author:

{'name': 'Jamis Buck', 'email': 'jamisbuck@gmail.com', 'username': 'jamis'}

Message: DRIVERS-2170 Server info on retryable errors must reflect the originating server (#1480)

  • DRIVERS-2170 server info on retryable errors must reflect originating server
  • update changelogs
Comment by Jamis Buck [ 11/Dec/23 ]

PR: https://github.com/mongodb/specifications/pull/1480

Comment by Oleg Pudeyev (Inactive) [ 08/Oct/19 ]

FYI, in Ruby this diagnostic is applied to all failing operations, not just those eligible to be retried.

Generated at Thu Feb 08 08:24:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.