-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Change Streams
-
None
It's possible that calling Watch can time out due to the database taking a long time to assemble the initial batch of change stream messages (see below for specifics about how that can happen). If that happens, one of a few timeout conditions can happen non-deterministically:
- A server-side timeout happens (due to the user or CSOT setting MaxTimeMS on the "aggregate" command) and the server returns a MaxTimeMSExpired (code 50) error. The change stream "aggregate" execute loop will retry the "aggregate" command, which will almost certainly fail because the client-side timeout is about to happen. Watch will return one of a few client-side timeout errors depending on the exact timing:
- Return a WaitQueueTimeoutError while waiting to check out a connection.
- Return an ErrDeadlineWouldBeExceeded before sending a retried "aggregate" command.
- Other client-side timeout errors are possible, but much less likely.
- A client-side timeout happens while waiting for the server to respond. The change stream "aggregate" execute loop will retry the "aggregate" command, which will fail while trying to check out another connection here. Watch will return a WaitQueueTimeoutError, when the expected error is something like "incomplete read of message header: context deadline exceeded".
All of those errors have the same underlying error: a timeout. The expression in so many seemingly unrelated timeout errors is extremely confusing.
Definition of done:
- Watch returns an error that accurately reports the underlying failure when there is a server-side or client-side timeout.
- is related to
-
GODRIVER-2929 Improve error messaging by wrapping errors in Go Driver 1.x
- Backlog