[DRIVERS-2063] Handle write errors differently depending on whether the outcome is known Created: 24/Apr/19  Updated: 25/Oct/22

Status: Backlog
Project: Drivers
Component/s: Retryability
Fix Version/s: None

Type: Spec Change Priority: Major - P3
Reporter: A. Jesse Jiryu Davis Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
Driver Changes: Needed

 Description   

When a driver attempts to send a write command to the server, there are circumstances in which it is always safe to retry, e.g. a server selection error, a DNS error, a network error while writing to the socket, or a server error such as NotMaster or InterruptedDueToShutdown. In other cases the write can only be retried if the write satisfies the criteria in the Retryable Writes Spec, e.g. a network error while awaiting the server reply.

Consider improvements to driver error handling that accounts for the difference between these two categories of error.



 Comments   
Comment by Tom Selander [ 25/Oct/22 ]

Leads Triage: Analysis looks good, but sending to the backlog since this doesn't seem urgent.

Comment by Valentin Kavalenka [ 12/Oct/22 ]

Now that we have the NoWritesPerformed error label added by the server (see SERVER-66116, SERVER-66479, DRIVERS-2327), we have a clear way of distinguishing definite (definitely not modified the data) and indefinite (might have modified the data) errors:

  • any client error that happened before a client started sending bytes of a request to a server is a definite one1;
  • any server error labeled with NoWritesPerformed is a definite one2.

Users may need to know when an error is definite, whether for the purpose of retrying in user code3 or for forming an expectation for reads, as in SERVER-66116, is irrelevant. We acknowledged both that users may need that info, and that we expect the info to be obtained by analyzing errors when we accepted SERVER-66116 as a bug. Currently, there is no straightforward and robust way for users to decide whether errors are definite. Users have to analyze each error, possibly in a context of what the application is doing, to make a decision. Each application that does this, does this differently and likely introduces its own bugs in the process.

We can relief users from this burden by documenting and exposing the NoWritesPerformed label to them for server errors, and by providing a clear way to see if a client error is a definite one. One of the ways to do the latter is to add the NoWritesPerformed label on definite client errors in the driver4, this way users will have a single criterium for deciding whether an error is definite. It is possible, but not necessary the case, that some drivers expose client errors that are not specific to the driver. If there are such errors, then it is likely that a driver cannot attach to it information on whether it is a definite one without wrapping it in / replacing it with a driver-specific exception. The latter is likely a breaking change, which means that such exceptions will likely have to be left as is. Hopefully, this is not a common case.


1 Care must be taken when dealing with retries: a client error happening after the first attempt and before sending a retry to the server can be considered definite only if the first attempt failed with a definite error.

2 Note that currently NoWritesPerformed is added by the server only to errors that are also labeled with RetryableWriteError (thanks, shane.harvey@mongodb.com, for finding this out). I don't think this will need to change if we decide to proceed with the idea expressed in this comment, as if there are currently definite server errors that are not labeled with RetryableWriteError, then it is a bug, as they are trivially retryable.

3 It is worth noting that driver retries that we currently have as well as user code retries that rely on the driver-provided information on whether errors are definite, are there not for safety reasons, but for performance reasons and better user experience. Users that have business operations that must eventually be done no matter what, will have to find ways to retry those business operations in user code regardless of whether drivers retry commands for them or whether errors are definite.

4 Some client exceptions, e.g., MongoConnectionPoolClearedException, MongoTimeoutException are always definite, others may be definite depending on the situation. This requires doing an analysis on the driver-side, but the benefit is that it is done once for all users, saving them from doing the same while having access to less info.

Comment by A. Jesse Jiryu Davis [ 24/Apr/19 ]

This idea was spun off from WRITING-3721 - we considered the possibility of changing which writes are retryable in that project, but it's not in scope there. Hence, make a new ticket.

Comment by Bernie Hackett [ 24/Apr/19 ]

jesse, can you provide some context for this ticket?

Generated at Thu Feb 08 08:24:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.