[GODRIVER-2944] Support CSOT spec timeoutMode for non-tailable cursors Created: 14/Aug/23 Updated: 26/Sep/23 |
|
| Status: | Scheduled |
| Project: | Go Driver |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Rohan Sharan | Assignee: | Steve Silvester |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
The CSOT spec mentions a timeoutMode option on non-tailable cursors that makes it so that the timeout is not set cumulatively on all operations resulting from something like a Find, but instead individually on each initial operation and the follow up getMore commands: https://github.com/mongodb/specifications/blob/master/source/client-side-operations-timeout/client-side-operations-timeout.rst#non-tailable-cursor-behavior
Mongosync had a problem with this in HELP-47315, where it set the timeout to 5 minutes by default, which was too short for the whole Find operation (including the getMore commands) to finish. The TAR team currently has REP-3079 filed to mitigate the issue, but adhering to the CSOT spec would be preferable. The failing log in the mongod server logs is the following:
|
| Comments |
| Comment by Matt Dale [ 13/Sep/23 ] |
|
Created DRIVERS-2722 to recommend amending the CSOT spec. |
| Comment by Steve Silvester [ 12/Sep/23 ] |
|
We don't yet have a drivers ticket. A related ticket is |
| Comment by Tim Fogarty [ 07/Sep/23 ] |
|
Thank you for looking into this matt.dale@mongodb.com. We have been forced to implement a messy and imperfect workaround (REP-3079) in mongosync because we cannot set timeoutMode=ITERATION. I would love for us to remove the imperfect workaround asap. So just want to get an idea of what the next steps are here and what you think the ETA might be for this? |
| Comment by Matt Dale [ 06/Sep/23 ] |
|
Answering questions:
No. We'd only be avoiding the most detectable cases of inconsistencies between maxTimeMS applied on a find/aggregate and an operation timeout applied on a getMore, but there are all sorts of other timing conditions that could lead to confusion that we couldn't easily detect. Instead, it makes more sense to not default to using timeoutMS to limit cursor lifetimes.
Yes.
Documenting it would reduce confusion, but users need some way to not implicitly set a cursor lifetime. Currently it doesn't seem like there's a way to avoid setting a cursor lifetime when performing a find/aggregate without implementing timeoutMode or changing the default behavior. |
| Comment by Preston Vasquez [ 31/Aug/23 ] |
|
Notes from sync:
|
| Comment by Preston Vasquez [ 30/Aug/23 ] |
|
steve.silvester@mongodb.com shane.harvey@mongodb.com My understanding of why the Go Driver did not implement timeoutMode was because we could rely on contexts to time out cursor iterations, as with Python. However, Rohan brings up a good point here in that the operations will cumulatively share a timeout set on the client. Here is a gist written in go that illustrates this problem: https://gist.github.com/prestonvasquez/24f073cf8e4a0ffbe8f1dbc738a6aa6c Should we make timeoutMode a drivers-wide requirement instead of optional? |
| Comment by Preston Vasquez [ 30/Aug/23 ] |
|
rohan.sharan@mongodb.com ah I see. I am going to put this ticket back into triage to (1) sync with python, as they have a similar implementation, and (2) if this is something the Go Driver team has the bandwidth to address. |
| Comment by Rohan Sharan [ 30/Aug/23 ] |
|
preston.vasquez@mongodb.com The timeout that we're trying to protect against is the following: maxTimeMS is cumulative time that the server spends processing the original operation as well as any following getMore commands (not including the time in between getMore command processing) That means that we're not talking about timeouts on the client side, but on the server side. As far as I can tell, your example is sleeping on the client side, so the server probably is able to process all events within 100ms. |
| Comment by Preston Vasquez [ 29/Aug/23 ] |
|
rohan.sharan@mongodb.com This example to help illustrate how the client’s timeout shouldn’t effect additional “getMore” requests: https://gist.github.com/prestonvasquez/2b745e8c9de91c94e90ac18d235f1ef5 Does this resemble your use case? |