[SERVER-31400] Record Linux netstat metrics in ftdc Created: 05/Oct/17  Updated: 30/Oct/23  Resolved: 08/May/18

Status: Closed
Project: Core Server
Component/s: Diagnostics, Networking
Affects Version/s: None
Fix Version/s: 3.4.16, 3.6.6, 4.0.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Bruce Lucas (Inactive)
Resolution: Fixed Votes: 6
Labels: SWDI
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File poc.diff    
Issue Links:
Backports
Depends
Problem/Incident
causes SERVER-38704 RPM Binary -- SELinux Module Denials Closed
Related
is related to SERVER-31251 Increase backlog specified in listen() Open
Backwards Compatibility: Fully Compatible
Backport Requested:
v3.6, v3.4
Participants:
Case:

 Description   

We didn't include network metrics initially in ftdc because we weren't aware of any particular diagnostic value. However we have since discovered that some very long query times can be attributed to specific tcp behavior that can be diagnosed using network metrics on Linux; see SERVER-31251.

On Linux this would mean sampling and recording the content of /proc/net/netstat, which looks like this:

TcpExt: SyncookiesSent SyncookiesRecv SyncookiesFailed EmbryonicRsts PruneCalled RcvPruned OfoPruned OutOfWindowIcmps LockDroppedIcmps ArpFilter TW TWRecycled TWKilled PAWSPassive PAWSActive PAWSEstab DelayedACKs DelayedACKLocked DelayedACKLost ListenOverflows ListenDrops TCPPrequeued TCPDirectCopyFromBacklog TCPDirectCopyFromPrequeue TCPPrequeueDropped TCPHPHits TCPHPHitsToUser TCPPureAcks TCPHPAcks TCPRenoRecovery TCPSackRecovery TCPSACKReneging TCPFACKReorder TCPSACKReorder TCPRenoReorder TCPTSReorder TCPFullUndo TCPPartialUndo TCPDSACKUndo TCPLossUndo TCPLostRetransmit TCPRenoFailures TCPSackFailures TCPLossFailures TCPFastRetrans TCPForwardRetrans TCPSlowStartRetrans TCPTimeouts TCPLossProbes TCPLossProbeRecovery TCPRenoRecoveryFail TCPSackRecoveryFail TCPSchedulerFailed TCPRcvCollapsed TCPDSACKOldSent TCPDSACKOfoSent TCPDSACKRecv TCPDSACKOfoRecv TCPAbortOnData TCPAbortOnClose TCPAbortOnMemory TCPAbortOnTimeout TCPAbortOnLinger TCPAbortFailed TCPMemoryPressures TCPSACKDiscard TCPDSACKIgnoredOld TCPDSACKIgnoredNoUndo TCPSpuriousRTOs TCPMD5NotFound TCPMD5Unexpected TCPSackShifted TCPSackMerged TCPSackShiftFallback TCPBacklogDrop TCPMinTTLDrop TCPDeferAcceptDrop IPReversePathFilter TCPTimeWaitOverflow TCPReqQFullDoCookies TCPReqQFullDrop TCPRetransFail TCPRcvCoalesce TCPOFOQueue TCPOFODrop TCPOFOMerge TCPChallengeACK TCPSYNChallenge TCPFastOpenActive TCPFastOpenActiveFail TCPFastOpenPassive TCPFastOpenPassiveFail TCPFastOpenListenOverflow TCPFastOpenCookieReqd TCPSpuriousRtxHostQueues BusyPollRxPackets TCPAutoCorking TCPFromZeroWindowAdv TCPToZeroWindowAdv TCPWantZeroWindowAdv TCPSynRetrans TCPOrigDataSent TCPHystartTrainDetect TCPHystartTrainCwnd TCPHystartDelayDetect TCPHystartDelayCwnd TCPACKSkippedSynRecv TCPACKSkippedPAWS TCPACKSkippedSeq TCPACKSkippedFinWait2 TCPACKSkippedTimeWait TCPACKSkippedChallenge TCPWinProbe TCPKeepAlive
TcpExt: 4126 9923 0 0 0 0 0 5 0 0 18 0 0 0 0 0 14118943 329 7785 123805 127931 39075927 379936 330333058 0 30915932 0 468973 31495550 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 3289 118858 118522 0 0 247 0 7785 0 82 0 0 0 0 1 0 0 0 0 0 24 0 0 0 28 32 4 10475 0 0 7 0 6744 0 0 693 10 0 0 0 0 0 0 0 0 0 0 0 0 175 0 0 0 3387 84949059 0 0 9 297 0 0 0 0 0 0 0 54900
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts
IpExt: 0 0 8777 0 676593 0 20992350717 12981098336 331228 0 95096899 0 0 88802695 0 1604 0



 Comments   
Comment by Githook User [ 22/May/18 ]

Author:

{'username': 'bdlucas1', 'name': 'Bruce Lucas', 'email': 'bruce.lucas@10gen.com'}

Message: SERVER-31400 Record netstat metrics in ftdc

(cherry picked from commit 68aaf285c35b379a4c81231d86903c78e97d1e76)
Branch: v3.4
https://github.com/mongodb/mongo/commit/0768d9842baea52df153c84c627e63fd3de3683a

Comment by Githook User [ 22/May/18 ]

Author:

{'username': 'bdlucas1', 'name': 'Bruce Lucas', 'email': 'bruce.lucas@10gen.com'}

Message: SERVER-31400 Record netstat metrics in ftdc

(cherry picked from commit 68aaf285c35b379a4c81231d86903c78e97d1e76)
Branch: v3.6
https://github.com/mongodb/mongo/commit/9f1a9da5177224d8070bec33f934f6d68bf501b9

Comment by Githook User [ 08/May/18 ]

Author:

{'email': 'bruce.lucas@10gen.com', 'name': 'Bruce Lucas', 'username': 'bdlucas1'}

Message: SERVER-31400 Record netstat metrics in ftdc
Branch: master
https://github.com/mongodb/mongo/commit/68aaf285c35b379a4c81231d86903c78e97d1e76

Comment by Bruce Lucas (Inactive) [ 26/Apr/18 ]

I've attached a POC implementation. I've tried to match the style of the adjacent code. It implements the desired functionality, may need a little hygiene cleanup, does need a unit test. It has been smoke tested (i.e. I ran a mongod with this for about a minute) and observed to add the desired metrics in a form that's consumable by our tooling.

Generated at Thu Feb 08 04:26:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.