[SERVER-16903] Prevent query to single mongod from hanging during server power failure Created: 16/Jan/15  Updated: 09/Apr/20

Status: Open
Project: Core Server
Component/s: Networking
Affects Version/s: 2.8.0-rc5
Fix Version/s: features we're not sure of

Type: Improvement Priority: Major - P3
Reporter: Robert Guo (Inactive) Assignee: DO NOT USE - Backlog - Platform Team
Resolution: Unresolved Votes: 0
Labels: 28qa, move-sa, platforms-re-triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 14.04


Issue Links:
Tested
Participants:

 Description   

During power cycle testing of a single mongod, we found that a client query would hang when the server suddenly terminates due to a power failure.

The hanging will result in a socket exception only after the tcp keep alive period has expired. The expiration time is tcp_keepalive_time + tcp_keepalive_intvl * tcp_keepalive_probes. The former two values are set to be a maximum of 5 minutes each, but the tcp_keepalive_probes is left at the system default, which is 9 for Windows and Linux. This would result in the socket terminating after 5 + 5 * 9 = 50 minutes. Which is a long time.

A possible solution is to set the tcp_keepalive_probes to a lower number.


Generated at Thu Feb 08 03:42:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.