[SERVER-69063] Fix TCP keepalive option setting Created: 22/Aug/22  Updated: 14/Dec/23  Resolved: 21/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.0.0-rc0, 5.0.24

Type: Improvement Priority: Major - P3
Reporter: Billy Donahue Assignee: Billy Donahue
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Duplicate
duplicates SERVER-57468 Enable TCP_USER_TIMEOUT by default Backlog
Problem/Incident
is caused by SERVER-57466 Swallow connection reset-related erro... Closed
Related
Backwards Compatibility: Fully Compatible
Backport Requested:
v6.0, v5.0, v4.4
Sprint: Service Arch 2022-12-26, Service Arch 2022-09-05, Service Arch 2022-09-19, Service Arch 2022-10-03, Service Arch 2022-10-17, Service Arch 2022-10-31, Service Arch 2022-11-14, Service Arch 2022-11-28, Service Arch 2022-12-12, Service Arch 2023-01-09, Service Arch 2023-01-23, Service Arch 2023-02-06, Service Arch 2023-02-20, Service Arch 2023-03-06, Service Arch 2023-03-20, Service Arch 2023-04-03
Participants:
Case:

 Description   

It seems that SERVER-57466 broke the TCP Keepalive parameter settings on Linux.

I'm looking at this optval variable.
https://github.com/10gen/mongo/pull/695/files#diff-19e9e1bf9007ae5e281e4d8f2c2b4704547ad7917c3557867bf91fdc45e18601L155

Previously it was set to 1 and then overwritten by getsockopt.
The value retrieved by getsockopt is compared against a maxval to determine if clipping is necessary, and if so, a setsockopt overwrites the value attached to the socket.

But in SERVER-57466, the rawOptVal read by getsockopt is discarded.
The optVal variable is never rewritten at all, and the clipping never occurs because 1 second is never greater than the maxVal variable, which is a constant that's either 300 or 1 second depending on the call site.

So the setsockopt to configure TCP_KEEPIDLE and TCP_KEEPINTVL never happen.



 Comments   
Comment by Githook User [ 14/Dec/23 ]

Author:

{'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}

Message: SERVER-69063 fix the setting of TCP keepalive parameters

GitOrigin-RevId: 41a284c937e2bbc6dd1e505b7e5c3e783325f24b
Branch: v5.0
https://github.com/mongodb/mongo/commit/af365a30af4cd6205f686f9658daea4d6034c332

Comment by Githook User [ 21/Mar/23 ]

Author:

{'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}

Message: SERVER-69063 fix the setting of TCP keepalive parameters
Branch: master
https://github.com/mongodb/mongo/commit/80f4b5acfbcfea25d24ae6d78988da7b90f1d46b

Comment by Billy Donahue [ 02/Mar/23 ]

reviving code review and merging current master branch into it.
Will retest as well.

Comment by Billy Donahue [ 25/Aug/22 ]

I think it could be useful to have JUST this fix as a standalone commit.
I'll break off the piece of my PR for SERVER-57466 that applies to this and make it a PR for this ticket.

Comment by Billy Donahue [ 22/Aug/22 ]

blake.oler@mongodb.com can you confirm my reading of this code?
It looks like a mistake arising from converting Seconds to unsigned and back.

Generated at Thu Feb 08 06:12:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.