[SERVER-38231] Capture CPU utilization for each core in FTDC Created: 22/Nov/18  Updated: 20/Dec/21  Resolved: 19/May/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Linda Qin Assignee: Kelsey Schubert
Resolution: Won't Fix Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-36822 Report correct number of CPUs Backlog
Related
Participants:

 Description   

Currently FTDC only captures normalized CPU usage. It would be nice if it also captures CPU usage for each core. This will help to identify issues that are bound to single CPU core, e.g. slow initial sync due to large number of indexes.

I understand that this could potentially add a large number of metrics for powerful machines. So probably just capture the highest (and lowest) usage for these CPU cores?



 Comments   
Comment by Kelsey Schubert [ 19/May/21 ]

Hi Linda,

Sorry for letting this ticket slip through the cracks. I think given that we don't want to accidentally blow up the number of metrics we're looking at and looking at the individual break out may not help since the single thread would bounce around, I'm not particularly to pursue this work versus other diagnostic improvements. I'm going to go ahead and resolve this ticket, but let me know if you think I'm missing something.

Thanks,
Kelsey

Comment by Linda Qin [ 23/Nov/18 ]

I understand that the OS might schedule the thread onto different physical cpu cores, so for the single-threaded workload workload, the 100% usage could be bouncing around between the different cores.

it is also separately reported scaled so that 100% means a single CPU?

Could you clarify how this will be reported?

  • Is it like what we currently have in CM/OM, which reports both the CPU usage (so for a system with 4 cores, we will get 400% if all CPU cores are busy), and normalized CPU usage (so we will get 100% if all CPU cores are busy)?
  • Or will it report the CPU usage for individual core? So if there are 4 cores, we will get 4 more metrics (or probably 8 metrics - 4 for process CPU, and 4 for system CPU) for each core?

If the former, it won't help much I think. For example, if there are 4 cores, the CPU usage of 100% (normalized 25%) won't tell us if:

  • this is a single-threaded workload, so that one core is getting 100% (may bounce around between different cores due to cpu scheduling).
  • or each core is taking 25%, so the workload is not CPU bound.

The latter would shed some light on whether this is a singe-threaded workload - we can check if one core is getting 100% (not a specific one as this may bounce around between different cores due to cpu scheduling).

The concern for the latter (capture cpu usage for each core) is this could potentially add a large number of metrics (e.g. for system with 32 cores, we could get 32 or 64 more metrics).

  • If this is not an issue, then I think we could consider capturing these metrics.
  • Otherwise, we may consider capturing min/max individual cpu usage. e.g. at time T, the highest CPU usage is on core1 with 99%, then we will capture 99% for max at time T. At time T+x, the highest CPU usage is on coreY with 100%, then we will capture 100% for max at time T+x.
Comment by Bruce Lucas (Inactive) [ 22/Nov/18 ]

Single-threaded workloads won't necessarily be bound to a single core, so I don't think this gives you any more information than looking at the overall CPU utilization. Would it help if in addition to reporting CPU scaled to 100% means all cores are busy it is also separately reported scaled so that 100% means a single CPU?

Generated at Thu Feb 08 04:48:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.