-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Unknown
-
None
-
Component/s: CSOT, Retryability
-
None
-
Not Needed
Summary
The retryable writes pseudocode includes logic to return previousError when CSOT is enabled and the operation times out:
} else if (isExpired(timeoutMS)) { /* CSOT is enabled and the operation has timed out. */ throw previousError; }
(This was added in https://github.com/mongodb/specifications/commit/343ff9a3864e7141c5a056fbf19e71ca57e65740 but has no unified spec tests.)
In the pseudocode, previousError is a regular retryable write error, where currentError is a CSOT timeout error. This contradicts the CSOT requirement that CSOT errors be distinguisable:
If the timeoutMS option is set and the timeout expires, drivers MUST abort all blocking work and return control to the user with an error. This error MUST be distinguished in some way (e.g. custom exception type) to make it easier for users to detect when an operation fails due to a timeout.
We should investigate how drivers can reconcile this with CSOT:
For example, the Go Driver can use error wrapping: https://go.dev/play/p/-YicW6L9Uw6
PR thread: https://github.com/mongodb/specifications/pull/1878#discussion_r2744107118
Motivation
Who is the affected end user?
users who rely on CSOT with retryable operations. Downstream teams implementing the spec may also be affected if pseudocode is implemented 1:1.
How does this affect the end user?
Users may be confused when a timeout error occurs during retry but they cannot determine the underlying cause.
How likely is it that this problem or use case will occur?
Seems like an edge case. Requires CSOT enabled, a retryable error on the first attempt, and timeout during retry. More likely with short timeoutMS values or high-latency environments.
If the problem does occur, what are the consequences and how severe are they?
Users may have difficulty debugging failures
Is this issue urgent?
No. This is a spec clarification and test coverage gap.
Is this ticket required by a downstream team?
No.
Is this ticket only for tests?
Partially. The investigation may result in spec clarification (prose changes) and new test coverage. No functional driver changes expected unless drivers are found to be non-conformant (Node and Go Driver, for example)
Acceptance Criteria
Answer:
- Is the pseudocode reasonable?
- How should drivers implement this?
Update "drivers changes" if more is required, i.e. test / spec sinks.