Uploaded image for project: 'Go Driver'
  1. Go Driver
  2. GODRIVER-2944

Support CSOT spec timeoutMode for non-tailable cursors

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None

      Context

      The current CSOT behavior of including maxTimeMS on Find and Aggregate operations when timeoutMS is set limits the server-side cursor lifetime to timeoutMS. Users don't expect an operation-level timeout to apply to cursor lifetimes, and are surprised when it does. However, there are also cases where it's important to be able to set a server-side timeout for Find and Aggregate operations so that the "background reads" feature (see GODRIVER-3172) can help prevent connection churn for those operations.

      Since some customers require one or the other behaviors, we should add the timeoutMode option to Find and Aggregate operations so they can pick which behavior they want.

      Open questions:

      Should timeoutMode be a Client-level config or only an operation-level config?

      The way the CSOT spec describes timeoutMode seems to suggest it can only be configurable at the operation level:

      If timeoutMode is set to ITERATION, drivers MUST raise a client-side error if the operation is an aggregate with a $out or $merge pipeline stage.

      Tailable cursors only support the ITERATION value for the timeoutMode option. This is the default value and drivers MUST error if the option is set to CURSOR_LIFETIME.

      and about the Watch operation:

      These helpers MUST NOT support the timeoutMode option as change streams are an abstraction around tailable-awaitData cursors, so they implicitly use ITERATION mode.

      If timeoutMode can be set at the Client-level, it would be possible to create a Client with timeoutMode=ITERATION that can never run an Aggregate with an $out or $merge stage, which seems like unexpected behavior. It seems like timeoutMode should only be configurable at the operation level.

      Original description:

      The CSOT spec mentions a timeoutMode option on non-tailable cursors that makes it so that the timeout is not set cumulatively on all operations resulting from something like a Find, but instead individually on each initial operation and the follow up getMore commands: https://github.com/mongodb/specifications/blob/master/source/client-side-operations-timeout/client-side-operations-timeout.rst#non-tailable-cursor-behavior 

       

      Mongosync had a problem with this in HELP-47315, where it set the timeout to 5 minutes by default, which was too short for the whole Find operation (including the getMore commands) to finish. The TAR team currently has REP-3079 filed to mitigate the issue, but adhering to the CSOT spec would be preferable. The failing log in the mongod server logs is the following:

      {"t":{"$date":"2023-08-10T08:01:37.411+00:00"},"s":"W",  "c":"QUERY",    "id":20478,   "ctx":"conn7899290","msg":"getMore command executor error","attr":{"error":{"code":50,"codeName":"MaxTimeMSExpired","errmsg":"operation exceeded time limit"},"stats":{"stage":"FETCH","filter":{"$and":[{"$expr":{"$gte":["$_id",{"$const":{"$oid":"60a5a031b46b1000131feef3"}}]}},{"$expr":{"$lte":["$_id",{"$const":{"$oid":"63c70228caa93700121126f7"}}]}}]},"nReturned":15402655,"works":15402655,"advanced":15402655,"needTime":0,"needYield":0,"saveState":32929,"restoreState":32928,"isEOF":0,"docsExamined":15402655,"alreadyHasObj":0,"inputStage":{"stage":"IXSCAN","nReturned":15402655,"works":15402655,"advanced":15402655,"needTime":0,"needYield":0,"saveState":32929,"restoreState":32928,"isEOF":0,"keyPattern":{"_id":1},"indexName":"_id_","isMultiKey":false,"multiKeyPaths":{"_id":[]},"isUnique":true,"isSparse":false,"isPartial":false,"indexVersion":2,"direction":"forward","indexBounds":{"_id":["[ObjectId('60a5a031b46b1000131feef3'), ObjectId('63c70228caa93700121126f7')]"]},"keysExamined":15402655,"seeks":1,"dupsTested":0,"dupsDropped":0}},"cmd":{"getMore":7958325070821658687,"collection":"assumptions"}}} 

      Definition of done

      • Add a timeoutMode option for Find and Aggregate that can be either "Iteration" or "CursorLifetime". The default should be "Iteration". See the CSOT Cursors section for behavior details.

      Pitfalls

      The recommended implementation here does not align with the current CSOT spec, but aligns with the recommended changes in DRIVERS-2722. If we decide to keep the CSOT spec the way it is, the Go Driver will behave differently than the spec and possibly other drivers.

            Assignee:
            Unassigned Unassigned
            Reporter:
            rohan.sharan@mongodb.com Rohan Sharan
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: