Client-level bulk writes must retry getMores on overload errors

XMLWordPrintableJSON

    • Type: Spec Change
    • Resolution: Unresolved
    • Priority: Unknown
    • None
    • Component/s: Backpressure
    • None

      Summary

      What is the problem or use case, what are we trying to achieve?
      (From valentin.kovalenko@mongodb.com via Slack)
      The client-level bulk write operation (https://github.com/mongodb/specifications/blob/master/source/crud/bulk-write.md) is a write operation, but it is executed by executing multiple commands, some of which are write commands, and some of which are read commands. * Retrying the underlying write commands is obviously controlled by the retryWrites connection string option under the write retry policy (https://github.com/mongodb/specifications/blob/master/source/retryable-writes/retryable-writes.md).

      The aforementioned requirement must mean that when the client-level bulk write operation executes underlying commands, it must take into account both the retryWrites and the retryReads connection string options, depending on whether it is executing a write or a read command.

      Motivation

      Who is the affected end user?

      Who are the stakeholders?
      Users of client bulkWrites that experience overload errors during the server response cursor reading.

       

      How does this affect the end user?

      Are they blocked? Are they annoyed? Are they confused?
      Overload errors during the server response cursor reading will not be retried, leading to more application errors.

      How likely is it that this problem or use case will occur?

      Main path? Edge case?
      Whenever an overload error is encountered during the cursor reading. Not often, but not necessarily rare.

      If the problem does occur, what are the consequences and how severe are they?

      Minor annoyance at a log message? Performance concern? Outage/unavailability? Failover can't complete?
      More application errors and worse performance, generally.

      Is this issue urgent?

      Does this ticket have a required timeline? What is it?
      This isn't a blocker, but it should be done in the near future.

      Is this ticket required by a downstream team?

      Needed by e.g. Atlas, Shell, Compass?
      No.

      Is this ticket only for tests?

      Does this ticket have any functional impact, or is it just test improvements?
      No.

      Acceptance Criteria

      What specific requirements must be met to consider the design phase complete?

      • Change the spec language to say that drivers "MUST retry getMores on overload errors"
      • Add spec/prose tests to verify correct behavior.

            Assignee:
            Unassigned
            Reporter:
            Noah Stapp
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: