Uploaded image for project: 'Java Driver'
  1. Java Driver
  2. JAVA-4146

Fix timeout handling in `DefaultConnectionPool.getAsync`

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Priority: Unknown
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3.0
    • Component/s: Connection Management
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Documentation Changes:
      Not Needed

      Description

      Before implementing maxConnecting

      This section describes how asynchronous checkouts are implemented in the codebase between Nov 25, 2014 3b544aad086ea9b11039e1d188fbfa5ce12f7795 and before Apr 14, 2021 ead0357131c1c0baa364656b07c91f2c789918e3.

      DefaultConnectionPool.getAsync uses a single dedicated thread to handle all asynchronous checkout requests. This naturally queues all such requests, and the thread does not detect that a request timed out until it starts serving it. As a result, if the request that is currently being served is blocked because maxSize is reached, the effective timeout of any queued asynchronous request becomes at least as large as the duration the currently blocked request is blocked for. In other words, the pool has a potential to significantly overdue timeouts of asynchronous requests.

      However, if we constraint all asynchronous checkout timeouts to be equal (which is the case for now, but will no longer be true once CSOT is implemented), then a request closer to the tail of the queue cannot timeout before the request that is currently being served (the head request). When the head request times out, the dedicated thread is unblocked and proceeds with serving the next request in the line. Thus, the described approach while being incorrect in situations where request timeouts may be different, works correctly when timeouts are equal.

      If we had different timeouts, then we could have maintained the request queue explicitly, and chose each blocking time to be the minimal of the remaining time for all queued requests including the one that is being served. Block the thread for this amount of time, then expire and remove from the queue all requests that timed out (this approach is used in com.mongodb.internal.connection.LoadBalancedCluster). When all timeouts are equal, the minimal remaining time is the one of the request that is being served, which makes no need for any additional machinery to properly handle timeouts of all queued requests.

      After implementing maxConnecting

      Since Apr 14, 2021 ead0357131c1c0baa364656b07c91f2c789918e3 the situation has changed. Now async checkout requests may also be blocked because maxConnecting is reached, and such blocking is done in another dedicated thread. Thus, now we have two queues of async checkout requests, one per a dedicated thread. And it is no longer true that of the requests in the second queue the one that has the smallest remaining time before timing out is the one that is currently being served (the head request). In other words, even without CSOT, the current implementation may significantly overdue timeouts of async checkout requests.

      Approaches to solve the problem

      One approach would be to handle timeout of queued async requests as mentioned above i.e., by changing the logic of each dedicated thread. Another approach is to have one more dedicated thread that monitors which queued requests have timed out, removes them from the queues and completes with a timeout exception. Both approaches require us maintaining request queues explicitly. The first approach results in smaller resource consumption (no need to have one more dedicated thread), while the second approach does not require much changes to the existing logic of the two dedicated threads (because it uses a third thread that may have the new timeout logic separated from the existing logic).

      My plan is to try the second approach because it is simpler. It appears that the new (third) dedicated thread may be shared between all pools of the same MongoClient, thus significantly reducing the overhead. However, if we decide to implement such sharing, it should be done as a separate tsk.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              valentin.kovalenko Valentin Kavalenka
              Reporter:
              valentin.kovalenko Valentin Kavalenka
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: