[SERVER-38811] TCP_KEEPINTVL should be 1 second Created: 02/Jan/19 Updated: 29/Oct/23 Resolved: 23/Jan/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Networking |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.8, 4.0.25 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Mathias Stearn | Assignee: | Mathias Stearn |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Backport Requested: |
v4.0
|
||||||||||||
| Sprint: | Service Arch 2019-01-14, Service Arch 2019-01-28 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Currently we set it to 5 minutes, matching TCP_KEEPIDLE, but that seems incorrect. KEEPINTVL is how frequently to probe a connection that has been idle for KEEPIDLE seconds until we get a successful reply from the remote host, which restarts the KEEPIDLE timer. Essentially this is used to detect blackholed connections, and there is no reason to wait 45 minutes (default value 9 of TCP_KEEPCNT * KEEPINTVL) to realize a connection is dead. This will lower it to 5 minutes + 10 seconds. |
| Comments |
| Comment by Githook User [ 07/May/21 ] |
|
Author: {'name': 'Andrew Shuvalov', 'email': 'andrew.shuvalov@mongodb.com', 'username': 'shuvalov-mdb'}Message: |
| Comment by Githook User [ 23/Jan/19 ] |
|
Author: {'email': 'mathias@10gen.com', 'name': 'Mathias Stearn', 'username': 'RedBeard0531'}Message: |
| Comment by Mathias Stearn [ 11/Jan/19 ] |
|
Since there was still some confusion about this, I should be clear that this should not result in any additional network traffic, since this will still only start sending keepalives after connections have been idle for 5 minutes. The only thing that is changing is how quickly we will retry sending out a keepalive message after failing to get a reply. Once we get a reply to the keepalive, we will wait 5 minutes before sending another. This will bring the ratio of KEEPIDLE and KEEPINTVL closer to the linux defaults: 7200 seconds (2 hours) and 75 seconds. We currently are setting both to the same value, which doesn't really make sense. |
| Comment by Mathias Stearn [ 03/Jan/19 ] |
|
No, TCP_KEEPINTVL is how frequently to retry sending a keepalive after the initial keepalive packet is sent when the connection has been idle for TCP_KEEPIDLE, which will still be 5 minutes. Once the other node replies to the keepalive packet (which should be handled in the kernel without waking user space) it restarts the KEEPIDLE timer. So this shouldn't increase the number of packets sent at all, except in cases of packet loss. |
| Comment by Andy Schwerin [ 03/Jan/19 ] |
|
Won't this generate a large number of keepalive packets when there are a ton of idle connections? It is common for MongoDB clients to create a large number of mostly idle connections, and mongos connection pooling can lead to a similar state. Performing 1-second keepalives on all of those connections would mean an uptick in packets-per-second. |