[SERVER-21818] Capture system metrics in FTDC Created: 09/Dec/15  Updated: 25/Apr/17  Resolved: 26/Jul/16

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: None
Fix Version/s: 3.2.13, 3.3.11

Type: Improvement Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Mark Benvenuto
Resolution: Done Votes: 3
Labels: monitoring
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File sysmon.py    
Issue Links:
Depends
depends on SERVER-24572 Add support for collecting informatio... Closed
depends on SERVER-24605 Add support for collecting informatio... Closed
depends on SERVER-24606 Add support for collecting informatio... Closed
depends on SERVER-24607 Add Collector for system statistics Closed
depends on SERVER-24608 Add Windows performance counter colle... Closed
depends on SERVER-24610 Add FTDC Collector for Windows Perfor... Closed
Documented
is documented by DOCS-9502 Docs for SERVER-21818: Capture system... Closed
Related
is related to SERVER-28953 Capture df (disk full) statistics in ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Platforms 15 (06/03/16), Platforms 18 (08/05/16)
Participants:

 Description   

Currently full-time data capture only includes internal metrics (with a small number of exceptions). It would be useful to also capture system metrics related to cpu, memory, and storage. For illustrative purposes attached is a POC data capture tool sysmon.py that captures such information on Linux from /proc/stat, /proc/meminfo, and /sys/block/*/stat that has proven useful for problem diagnosis. Captured information includes the following:

/proc/stat
cpu_user
cpu_nice
cpu_system
cpu_idle
cpu_iowait
cpu_irq
cpu_softirq
cpu_steal
cpu_guest
cpu_guest_nice
ctxt
btime
processes
procs_running
procs_blocked
cpus

/proc/meminfo
memtotal
memfree
buffers
cached
swapcached
active
inactive
active anon
inactive anon
active file
inactive file
dirty

/sys/block/*/stat
sd*.reads
sd*.reads_merged
sd*.read_sectors
sd*.read_time_ms
sd*.writes
sd*.writes_merged
sd*.write_sectors
sd*.write_time_ms
sd*.io_in_progress
sd*.io_time_ms
sd*.io_queued_ms

Similar metrics are available through Windows APIs. Where applicable cumulative counters are preferred over instantaneous values because cumulative counters can be sampled at arbitrary time intervals. In general raw system-specific metrics with a minimum of processing are preferred, leaving it to tooling to subsample as needed and compute useful values for display. (An exception might be for example that sectors could be converted to bytes because sector may be a system- or device-specific unit.)



 Comments   
Comment by Mark Benvenuto [ 26/Jul/16 ]

Linux and Windows implementations have been completed.

Comment by Andrew Morrow (Inactive) [ 26/Jul/16 ]

Mark please resolve this ticket, we held it open as a tracking ticket but all of the associated work has been completed.

Comment by Mark Benvenuto [ 24/May/16 ]

lucas.hrabovsky Thanks for the reference. sigar supports a lot of operating systems, and a lot of very old versions (like NT4). I will use as a reference if I need to find the various APIs to gather data.

Generated at Thu Feb 08 03:58:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.