-
Type: Spec Change
-
Resolution: Unresolved
-
Priority: Unknown
-
None
-
Component/s: CSOT
-
None
-
Needed
-
Summary
The CSOT spec requires both applying client-side timeouts via a language-specific timeout mechanism and applying server-side timeouts by sending maxTimeMS with commands. The spec suggests that a primary purpose for sending maxTimeMS is to prevent connection churn:
When constructing a command, drivers use the timeoutMS option to derive a value for the maxTimeMS command option and the socket timeout ... To allow the server to gracefully error and avoid churn, drivers must account for the network round trip in the maxTimeMS calculation.
However, managing cursor lifetime doesn't mitigate connection churn. Managing the lifetime of a single getMore can reduce connection churn, so sending maxTimeMS with each getMore seems to make more sense than applying it to the entire cursor lifetime. We should amend the CSOT spec to make the default timeoutMode=ITERATION. If users specifically want to control the cursor lifetime with the CSOT timeout, they can enable it using timeoutMode=CURSOR_LIFETIME. They can also control cursor lifetime by explicitly closing the cursor.
Note that the above is not a problem for drivers that implement CSOT and have a timeoutMode option. However, timeoutMode is currently optional and is not implemented in some drivers that implement CSOT. The CSOT spec should require a safer default behavior for drivers that do not implement the timeoutMode option.
Motivated by GODRIVER-2944.
Motivation
Who is the affected end user?
Customers who enable CSOT timeouts and run a find or aggregate operation. Note that only applies to drivers that do not implement the timeoutMode option.
How does this affect the end user?
Cursors may time out server-side before they are done reading from them. Customers may not expect timeoutMS to control cursor lifetime because it is advertised as a client-side timeout mechanism.
How likely is it that this problem or use case will occur?
For customers using drivers that implement CSOT but do not implement the timeoutMode option, it will occur anytime the expected lifetime of a cursor is longer than the configured client-side timeout (i.e. timeoutMS).
If the problem does occur, what are the consequences and how severe are they?
Prevents long-running cursor iteration operations from completing. Customers may get around the issue by increasing timeoutMS if they can determine the two are connected.
Is this issue urgent?
No.
Is this ticket required by a downstream team?
The current behavior is causing issues for the TAR (mongosync) team. See GODRIVER-2944.
Is this ticket only for tests?
No.
Acceptance Criteria
- CSOT spec is amended to make the default cursor timeout mode ITERATION.
- (Optional) Rationale section is amended to describe the reason it was switched from CURSOR_LIFETIME to ITERATION.
- is depended on by
-
DRIVERS-2724 clarify CSOT behavior for gridfs streams
- Backlog
- is related to
-
PYTHON-4585 Cursor.to_list does not apply client's timeoutMS setting
- Closed
-
SERVER-64854 Allow external clients to set maxTimeMSOpOnly
- Backlog
-
GODRIVER-2944 Support CSOT spec timeoutMode for non-tailable cursors
- Backlog