Uploaded image for project: 'Go Driver'
  1. Go Driver
  2. GODRIVER-2932

Timeout errors that occur while starting a change stream are confusing and non-deterministic

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Change Streams
    • None
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?

      It's possible that calling Watch can time out due to the database taking a long time to assemble the initial batch of change stream messages (see below for specifics about how that can happen). If that happens, one of a few timeout conditions can happen non-deterministically:

      1. A server-side timeout happens (due to the user or CSOT setting MaxTimeMS on the "aggregate" command) and the server returns a MaxTimeMSExpired (code 50) error. The change stream "aggregate" execute loop will retry the "aggregate" command, which will almost certainly fail because the client-side timeout is about to happen. Watch will return one of a few client-side timeout errors depending on the exact timing:
        1. Return a WaitQueueTimeoutError while waiting to check out a connection.
        2. Return an ErrDeadlineWouldBeExceeded before sending a retried "aggregate" command.
        3. Other client-side timeout errors are possible, but much less likely.
      2. A client-side timeout happens while waiting for the server to respond. The change stream "aggregate" execute loop will retry the "aggregate" command, which will fail while trying to check out another connection here. Watch will return a WaitQueueTimeoutError, when the expected error is something like "incomplete read of message header: context deadline exceeded".

      All of those errors have the same underlying error: a timeout. The expression in so many seemingly unrelated timeout errors is extremely confusing.

      Definition of done:

      • Watch returns an error that accurately reports the underlying failure when there is a server-side or client-side timeout.

            Assignee:
            Unassigned Unassigned
            Reporter:
            matt.dale@mongodb.com Matt Dale
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: