[SERVER-31400] Record Linux netstat metrics in ftdc Created: 05/Oct/17 Updated: 30/Oct/23 Resolved: 08/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Diagnostics, Networking |
| Affects Version/s: | None |
| Fix Version/s: | 3.4.16, 3.6.6, 4.0.0-rc0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Bruce Lucas (Inactive) |
| Resolution: | Fixed | Votes: | 6 |
| Labels: | SWDI | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Backport Requested: |
v3.6, v3.4
|
||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||
| Description |
|
We didn't include network metrics initially in ftdc because we weren't aware of any particular diagnostic value. However we have since discovered that some very long query times can be attributed to specific tcp behavior that can be diagnosed using network metrics on Linux; see SERVER-31251. On Linux this would mean sampling and recording the content of /proc/net/netstat, which looks like this:
|
| Comments |
| Comment by Githook User [ 22/May/18 ] |
|
Author: {'username': 'bdlucas1', 'name': 'Bruce Lucas', 'email': 'bruce.lucas@10gen.com'}Message: (cherry picked from commit 68aaf285c35b379a4c81231d86903c78e97d1e76) |
| Comment by Githook User [ 22/May/18 ] |
|
Author: {'username': 'bdlucas1', 'name': 'Bruce Lucas', 'email': 'bruce.lucas@10gen.com'}Message: (cherry picked from commit 68aaf285c35b379a4c81231d86903c78e97d1e76) |
| Comment by Githook User [ 08/May/18 ] |
|
Author: {'email': 'bruce.lucas@10gen.com', 'name': 'Bruce Lucas', 'username': 'bdlucas1'}Message: |
| Comment by Bruce Lucas (Inactive) [ 26/Apr/18 ] |
|
I've attached a POC implementation. I've tried to match the style of the adjacent code. It implements the desired functionality, may need a little hygiene cleanup, does need a unit test. It has been smoke tested (i.e. I ran a mongod with this for about a minute) and observed to add the desired metrics in a form that's consumable by our tooling. |