-
Type: Improvement
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Component/s: None
-
(copied to CRM)
Problem Description
keepalive in the Java driver (and other drivers) is disabled by default. This leaves the possibility of leaving downed server connections in the middle of a socket read stuck in a waiting state.
We had a situation where a mongos server crashed leaving 100 open connections on the client side. When we recovered the mongos the Java driver still had 100 bad connections taken from the pool and would not open new ones.
As part of this change, drivers should include in their documentation a link to the MongoDB Diagnostics FAQ keepalive section
Specification
- A driver MUST enable TCP keepalive by default. This matches the behavior of the MongoDB server.
- A driver MUST deprecate TCP keepalive-related options in the connection string (and any other way that it is configured), as there is no demonstrated benefit to allowing it to be disabled. This also matches the behavior of the server.
- A driver SHOULD set tcp_keepalive_time to 300 seconds unless it determines that the system default is already less than that. If the driver is unable to determine the system default at all it should not attempt to change it. This matches the behavior of the server as well.
- A driver SHOULD set tcp_keepalive_intvl to 10 seconds unless it determines that the system default is already less than that. If the driver is unable to determine the system default at all it should not attempt to change it. This is not the current behavior of the server, but if accepted here it will be recommended. The reasoning is that with the default of 75 seconds for this value and a default of 9 probes, the actual time to failure is 300 + (75 * 9) = 975 sec = 16.25 minutes. With a 10 second interval between probes it becomes a more reasonable 6.5 minutes.
- A driver SHOULD set tcp_keepalive_cnt to 9 probes unless it determines that the system default is already less than that. If the driver is unable to determine the system default at all it should not attempt to change it.
- A driver MUST document how keepalive-related options are configured. Drivers that can set tcp_keepalive_time and tcp_keepalive_intvl to the values mandated above MUST document that they do so. Drivers that can not MUST document that they do not and link to appropriate MongoDB Diagnostics FAQ keepalive section for instructions on setting these values at the system level.
- depends on
-
GODRIVER-37 Set TCP keep alive by default
- Closed
-
RUBY-1283 Enable and configure TCP Keepalive by default
- Closed
-
CDRIVER-2176 Enable and configure TCP Keepalive by default
- Closed
-
CSHARP-1994 Enable and configure TCP Keepalive by default
- Closed
-
CXX-1363 Have TCP keepalive default to true
- Closed
-
JAVA-2531 Have TCP keepalive default to true
- Closed
-
NODE-1024 Have TCP keepalive default to true
- Closed
-
PHPC-969 Have TCP keepalive default to true
- Closed
-
PYTHON-1279 Have TCP keepalive default to true
- Closed
-
RUST-170 Enable and configure TCP Keepalive by default
- Closed
- is related to
-
RUBY-1799 Connection options for tcp_keepalive_* not exposed to Mongo::Client
- Closed
-
NODE-6245 Restore keepAliveInitialDelay configurability
- Backlog
- related to
-
RUBY-1211 Add Mongo::TCPSocket Keep-Alive Configuration
- Closed
-
SERVER-29341 Set TCP_KEEPIDLE and TCP_KEEPINTVL (or OS equivalent) whenever available
- Closed
-
GODRIVER-2846 Make expected TCP KeepAlive behavior explicit
- Closed