-
Type:
Spec Change
-
Resolution: Unresolved
-
Priority:
Unknown
-
None
-
Component/s: Backpressure
-
None
Summary
What is the problem or use case, what are we trying to achieve?
(From valentin.kovalenko@mongodb.com via Slack)
The client-level bulk write operation (https://github.com/mongodb/specifications/blob/master/source/crud/bulk-write.md) is a write operation, but it is executed by executing multiple commands, some of which are write commands, and some of which are read commands. * Retrying the underlying write commands is obviously controlled by the retryWrites connection string option under the write retry policy (https://github.com/mongodb/specifications/blob/master/source/retryable-writes/retryable-writes.md).
- The underlying read commands are not retried under the read retry policy (https://github.com/mongodb/specifications/blob/master/source/retryable-reads/retryable-reads.md).
- However, the underlying reads commands are retried under the overload retry policy (https://github.com/mongodb/specifications/blob/master/source/client-backpressure/client-backpressure.md#overload-retry-policy and https://github.com/mongodb/specifications/blob/master/source/crud/bulk-write.md#retrying-getmores). The overload retry policy states that "A retry attempt will only be permitted if...the command is a read and retryReads is enabled".
The aforementioned requirement must mean that when the client-level bulk write operation executes underlying commands, it must take into account both the retryWrites and the retryReads connection string options, depending on whether it is executing a write or a read command.
Motivation
Who is the affected end user?
Who are the stakeholders?
Users of client bulkWrites that experience overload errors during the server response cursor reading.
How does this affect the end user?
Are they blocked? Are they annoyed? Are they confused?
Overload errors during the server response cursor reading will not be retried, leading to more application errors.
How likely is it that this problem or use case will occur?
Main path? Edge case?
Whenever an overload error is encountered during the cursor reading. Not often, but not necessarily rare.
If the problem does occur, what are the consequences and how severe are they?
Minor annoyance at a log message? Performance concern? Outage/unavailability? Failover can't complete?
More application errors and worse performance, generally.
Is this issue urgent?
Does this ticket have a required timeline? What is it?
This isn't a blocker, but it should be done in the near future.
Is this ticket required by a downstream team?
Needed by e.g. Atlas, Shell, Compass?
No.
Is this ticket only for tests?
Does this ticket have any functional impact, or is it just test improvements?
No.
Acceptance Criteria
What specific requirements must be met to consider the design phase complete?
- Change the spec language to say that drivers "MUST retry getMores on overload errors"
- Add spec/prose tests to verify correct behavior.