Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21818

Capture system metrics in FTDC

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.2.13, 3.3.11
    • Component/s: Diagnostics
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Sprint:
      Platforms 15 (06/03/16), Platforms 18 (08/05/16)

      Description

      Currently full-time data capture only includes internal metrics (with a small number of exceptions). It would be useful to also capture system metrics related to cpu, memory, and storage. For illustrative purposes attached is a POC data capture tool sysmon.py that captures such information on Linux from /proc/stat, /proc/meminfo, and /sys/block/*/stat that has proven useful for problem diagnosis. Captured information includes the following:

      /proc/stat
      cpu_user
      cpu_nice
      cpu_system
      cpu_idle
      cpu_iowait
      cpu_irq
      cpu_softirq
      cpu_steal
      cpu_guest
      cpu_guest_nice
      ctxt
      btime
      processes
      procs_running
      procs_blocked
      cpus

      /proc/meminfo
      memtotal
      memfree
      buffers
      cached
      swapcached
      active
      inactive
      active anon
      inactive anon
      active file
      inactive file
      dirty

      /sys/block/*/stat
      sd*.reads
      sd*.reads_merged
      sd*.read_sectors
      sd*.read_time_ms
      sd*.writes
      sd*.writes_merged
      sd*.write_sectors
      sd*.write_time_ms
      sd*.io_in_progress
      sd*.io_time_ms
      sd*.io_queued_ms

      Similar metrics are available through Windows APIs. Where applicable cumulative counters are preferred over instantaneous values because cumulative counters can be sampled at arbitrary time intervals. In general raw system-specific metrics with a minimum of processing are preferred, leaving it to tooling to subsample as needed and compute useful values for display. (An exception might be for example that sectors could be converted to bytes because sector may be a system- or device-specific unit.)

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                3 Vote for this issue
                Watchers:
                22 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: