Uploaded image for project: 'Java Driver'
  1. Java Driver
  2. JAVA-3690

Domain name resolution issues break DefaultConnectionPool when using getAsync

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.1.0
    • Affects Version/s: 3.9.0, 4.0.0
    • Labels:
      None

      I've recently experienced intermittent DNS resolution issues while working on a project and these issues eventually resulted all future queries failing with "Timeout waiting for a pooled item ...". Restarting the program helped but the problem would resurface after enough resolution failures.

      Now this is technically a Scala application using the Scala driver but I was able to pinpoint the problem. Invoking getAsync(SingleResultCallback<InternalConnection>) on DefaultConnectionPool will invoke openAsync to open the pooled connection if it isn't already open. And when the connection is backed by a AsynchronousSocketChannelStream or NettyStream that invokes their openAsync(AsyncCompletionHandler<Void>) which executes serverAddress.getSocketAddresses(). It appears that openAsync in DefaultConnectionPool is only expecting exceptions via it's callback. But exception thrown form serverAddress.getSocketAddresses() are propagated all the way back to it.
      Now at least in the Scala driver the exception is caught by an ErrorHandlingResultCallback eventually, which stops the exception. But nothing releases the connection back to the pool. This eventually exhausts the pool and makes it unusable.

      After enabling trace logs I noticed that the connection that was being opened right before ErrorHandlingResultCallback logged an error was always lost. Even after 3 hours it was never checked back into the pool or referenced in any other log message.

      I believe that the openAsync methods in AsynchronousSocketChannelStream and NettyStream should capture throwables and use them to fail the AsyncCompletionHandler. I'm not sure if that could break any existing use cases. Though based on the history of these two files the execution of serverAddress.getSocketAddresses() used to be inside a try block, but was moved out of when JAVA-2700 added support for connecting to all IPs and part of the method was made recursive.

            Assignee:
            john.stewart@mongodb.com John Stewart (Inactive)
            Reporter:
            metod@medja.net Metod Medja
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: