Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-2035

Use minimum RTT for CSOT maxTimeMS calculation instead of 90th percentile

    • Type: Icon: Task Task
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Component/s: CSOT
    • None
    • Not Needed
    • Hide

      Drivers must use the minimum RTT for CSOT maxTimeMS calculation instead of 90th percentile. At least 2 RTT samples are required otherwise drivers must use 0 as RTT. Only keep at most the last 10 samples. These changes were made to avoid preemptively failing operations due to inaccurate or unstable RTT measurements.

      Spec change commit: https://github.com/mongodb/specifications/commit/c06650d86f7e47ea30cb2d992942bcec6ef155f9
      Spec change PR: https://github.com/mongodb/specifications/pull/1350

      Show
      Drivers must use the minimum RTT for CSOT maxTimeMS calculation instead of 90th percentile. At least 2 RTT samples are required otherwise drivers must use 0 as RTT. Only keep at most the last 10 samples. These changes were made to avoid preemptively failing operations due to inaccurate or unstable RTT measurements. Spec change commit: https://github.com/mongodb/specifications/commit/c06650d86f7e47ea30cb2d992942bcec6ef155f9 Spec change PR: https://github.com/mongodb/specifications/pull/1350
    • $i18n.getText("admin.common.words.hide")
      Key Status/Resolution FixVersion
      PYTHON-3616 Fixed 4.4
      GODRIVER-2762 Fixed 2.0.0
      NODE-5825 Fixed 6.6.0
      $i18n.getText("admin.common.words.show")
      #scriptField, #scriptField *{ border: 1px solid black; } #scriptField{ border-collapse: collapse; } #scriptField td { text-align: center; /* Center-align text in table cells */ } #scriptField td.key { text-align: left; /* Left-align text in the Key column */ } #scriptField a { text-decoration: none; /* Remove underlines from links */ border: none; /* Remove border from links */ } /* Add green background color to cells with FixVersion */ #scriptField td.hasFixVersion { background-color: #00FF00; /* Green color code */ } #scriptField td.willNotDo { background-color: #FF0000; /* Red color code */ } /* Center-align the first row headers */ #scriptField th { text-align: center; } Key Status/Resolution FixVersion PYTHON-3616 Fixed 4.4 GODRIVER-2762 Fixed 2.0.0 NODE-5825 Fixed 6.6.0

      In the PR review for the timeout spec matt.dale provided a suggestion which was never resolved. To quote:

      Using the 90th percentile RTT latency will result in some operations that are likely to complete being cancelled instead.

      Let's consider a Find operation that completes quickly on the server (i.e. <1ms) running on an Atlas cluster, so almost all of the latency is from the network round trip. There are 3 buckets of timing conditions the driver will encounter:

      1. The client-side deadline is greater than (now + max observed RTT); the operation will almost certainly complete before the deadline.
      2. The client-side deadline is between [(now + min observed RTT), (now + max observed RTT)]; the operation may complete or may fail due to timeout.
      3. The client-side deadline is less than (now + min observed RTT); the operation will almost certainly fail due to timeout.

      The operations we're interested in are in bucket 2. By assuming the network round trip will take the 90th percentile observed RTT, we may cancel operations that have a nearly 90% chance of completing before the deadline. Cancelling operations is dangerous because we're actually preventing the driver from doing work. We should instead bias toward cancelling as few operations that have a reasonable chance of completing as possible, in exchange for also letting more operations time out.

      I propose that we change the cancellation threshold to the 5-minute minimum RTT (i.e. minimum RTT observed in the last 5 minutes) instead of the 90th percentile. While the 10th or 25th percentile more closely match the "reasonable chance of succeeding" threshold, the added complexity of using the t-digest algorithm doesn't seem to justify the small optimization.

      We should reconsider the 90th RTT heuristic used for preventing sending an operation and setting maxTimeMS.

            Assignee:
            shane.harvey@mongodb.com Shane Harvey
            Reporter:
            shane.harvey@mongodb.com Shane Harvey
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: