[JAVA-3690] Domain name resolution issues break DefaultConnectionPool when using getAsync Created: 09/Apr/20 Updated: 28/Oct/23 Resolved: 30/Apr/20 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | Async, Connection Management |
| Affects Version/s: | 3.9.0, 4.0.0 |
| Fix Version/s: | 4.1.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Metod Medja | Assignee: | John Stewart (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Description |
|
I've recently experienced intermittent DNS resolution issues while working on a project and these issues eventually resulted all future queries failing with "Timeout waiting for a pooled item ...". Restarting the program helped but the problem would resurface after enough resolution failures. Now this is technically a Scala application using the Scala driver but I was able to pinpoint the problem. Invoking getAsync(SingleResultCallback<InternalConnection>) on DefaultConnectionPool will invoke openAsync to open the pooled connection if it isn't already open. And when the connection is backed by a AsynchronousSocketChannelStream or NettyStream that invokes their openAsync(AsyncCompletionHandler<Void>) which executes serverAddress.getSocketAddresses(). It appears that openAsync in DefaultConnectionPool is only expecting exceptions via it's callback. But exception thrown form serverAddress.getSocketAddresses() are propagated all the way back to it. After enabling trace logs I noticed that the connection that was being opened right before ErrorHandlingResultCallback logged an error was always lost. Even after 3 hours it was never checked back into the pool or referenced in any other log message. I believe that the openAsync methods in AsynchronousSocketChannelStream and NettyStream should capture throwables and use them to fail the AsyncCompletionHandler. I'm not sure if that could break any existing use cases. Though based on the history of these two files the execution of serverAddress.getSocketAddresses() used to be inside a try block, but was moved out of when |
| Comments |
| Comment by Venky Chowdary [ 21/Sep/22 ] |
|
Thank you for the quick response Jeffrey. We hit a similar issue in production where all the threads are stuck and Do you have any suggestions on what kind of scenarios can hit this problem Thanks, |
| Comment by Jeffrey Yemin [ 20/Sep/22 ] |
|
malempati77@gmail.com, I don't think this particular issue is present in the 3.7 release. The regression that this fixed was introduced in the commit for |
| Comment by Venky Chowdary [ 19/Sep/22 ] |
|
Is this issue applicable to 3.7.2 driver ? We are experiencing similar issue with MongoDB 3.6.9 and mongodb driver 3.7.2 |
| Comment by Githook User [ 30/Apr/20 ] |
|
Author: {'name': 'John Stewart', 'email': 'john.stewart@mongodb.com', 'username': 'jstewart-mongo'}Message: Domain name resolution issues break DefaultConnectionPool when using getAsync
|