[JAVA-481] Driver not retrying on Connection timed out SocketException Created: 02/Dec/11 Updated: 03/Jan/18 Resolved: 02/Dec/11 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | Performance |
| Affects Version/s: | 2.6.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | John Danner | Assignee: | Mariano Escribano |
| Resolution: | Done | Votes: | 0 |
| Labels: | connection, driver, query | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
I've got a MongoDB replica set across two datacenters. In my second data center I have some servers that point back to the primary instance in data center 1. I ran into a connection timeout issue (this happens pretty consistently) on the server, here is the stack trace: com.mongodb.DBPortPool gotError Is it possible for the driver to attempt recreate the connections and retry the query? It looks like the next query worked as expected. I do not see these errors on servers in the same data center as the primary mongodb server Note: latency between the my app server and the primary mongodb server is ~50 ms THANKS! |
| Comments |
| Comment by Brett Cave [ 17/Apr/13 ] |
|
This issue also occurs frequently in AWS EC2(Amazon Web Services). We see this daily. When reviewing mongo configuration for production, we came across the production checklist, which included a suggestion to drop the TCP_KEEPALIVE kernel configuration to a lower value. When dropping to 5 minutes, the errors started occurring much more frequently. We have now set keepalive back to 7200 (at OS level, not driver level) to reduce the frequency of this occurring. |
| Comment by John Danner [ 26/Mar/12 ] |
|
I think the issue is that the default linux keep-alive timeout is 2 hours prior to sending the first keep-alive packet and the default connection timeout on a Cisco ASA is 1 hour. This would suggest that despite setting the value it won't really do anything (in this situation/configuration). I'll adjust one of my system's keep-alive parameters to see if it will fix my issue - perhaps if someone else happens upon this ticket they will find it useful. If there is a desire for a more robust keep alive that isn't reliant on system setting/firewall settings I can submit this code as a patch. |
| Comment by Antoine Girbal [ 26/Mar/12 ] |
|
that should be the point of socketKeepAlive. |
| Comment by John Danner [ 26/Mar/12 ] |
|
I should note I'm using the socketKeepAlive option within the driver but the connectivity behavior remains |
| Comment by John Danner [ 26/Mar/12 ] |
|
I believe the root cause of the issue I'm experiencing is a firewall closing down the inactive connections. The driver isn't aware of the connection status until it's passed it off to the application above. I've updated the driver to periodically (configurable) issue the ping command to connections that have not been used in a configurable amount of time. I believe this will resolve the issue for me - would this patch be an acceptable addition to the default driver? The patch enables a monitoring thread to spin through the list of available DBPorts and checks on their last use, if it's over 15 minutes since use the ping command is issued which should keep the connection alive through a firewall or similar device. |
| Comment by Antoine Girbal [ 02/Dec/11 ] |
|
Feel free to follow up, though mongodb-user group may be a better place to get quick answers. |
| Comment by Antoine Girbal [ 02/Dec/11 ] |
|
the driver never retries a read after a timeout exception. Solutions for you:
|