Detailed steps to reproduce the problem?
The gist here illustrates how setting an operation-level timeout on v1 does not result in maxTimeMS being set on the command message.
History
CSOT context wrapping was implemented in GODRIVER-2496, the CSOT spec was still being modified during the initial implementation. Wrapping prevents operation-level server-side timeouts, a feature that was likely overlooked. Though, it is worth noting that
- Adding operation-level server side timeouts would technically be a breaking change since users might be relying on the context.DeadlineExceeded error / closed connection that results from timing out the read portion of the round trip
- Operation-level timeouts may still cause the "read" portion of a round trip to time out, closing the connection to the server.
Timing out the read portion of the round trip is the source of high connection churn described in HELP-56519. Adding operation-level timeouts could minimize this churn by short-circuiting the read timeout with server-side timeouts. However, this is only true if rttMin / rtt90 is large enough (potentially just positive):
func Execute(ctx context.Context) { for { // While retries are possible maxTimeMS := getRemainingTimeout(ctx) - rttMin // We would construct a wire message with maxTimeMS conn.Write(createWireMessage(maxTimeMS)) // Read until the buffer is filled / the context is timed out. Once timed // out, then the connection is closed and the DB will presumably terminate // the op. conn.Read(getRemaintingTimeout(ctx)) } }
Definition of done: what must be done to consider the task complete?
The v2 PR proposed in GODRIVER-2348 ensures that operation-level timeouts are included in the calculation of maxTimeMS. This portion of the PR should be back-ported to v1, after consideration for the breaking change described in (1) of the "history" section.
The exact Go version used, with patch level:
go version go1.21.4 darwin/arm64
The exact version of the Go driver used:
1.11.7
Describe how MongoDB is set up. Local vs Hosted, version, topology, load balanced, etc.
local, 8.0.0-alpha-4884-g0c18d0a, replica set, not load balanced
The operating system and version (e.g. Windows 7, OSX 10.8, ...)
OSX 14.3
Security Vulnerabilities
NA
- is caused by
-
GODRIVER-2496 Simplify maxTimeMS appension
- Closed
- is fixed by
-
GODRIVER-2348 Make CSOT feature-gated behavior the default
- Closed
- related to
-
GODRIVER-3172 Read responses in the background after an operation timeout
- Closed
-
GODRIVER-3152 Set maxTimeMS to minimize connection churn
- Closed
-
GODRIVER-2762 Use minimum RTT for CSOT maxTimeMS calculation instead of 90th percentile
- Closed