[SERVER-73237] Collect PSI (Pressure Stall Information) in FTDC Created: 24/Jan/23 Updated: 30/Jan/23 Resolved: 30/Jan/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Miguel Angel Nieto | Assignee: | Backlog - Security Team |
| Resolution: | Duplicate | Votes: | 5 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Server Security
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
Hello, Recent Kernel versions include PSI (Pressure Stall Information) that is very useful to understand the pressure on resources like CPU, Memory and Storage.
We only need to read `/proc/pressure/resource_name` where resource_name can be cpu, memory and storage. This information would be really helpful when doing analysis for our customers, since it would give us a good metric on the resources pressure prior to an event that is being investigated. In the future, once this is implemented, we could even graph it in our tools and Atlas interface, to understand if the clusters are close to the stall point or even use that information to alert our customers before the stall itself happens. Let me know if you have any question. Regards. |
| Comments |
| Comment by Miguel Angel Nieto [ 27/Jan/23 ] |
|
Thinking about this, I guess that if we want other tools to use this information (like monitoring agent, or atlas interface) it would need to go to serverStatus. |
| Comment by Eric Sedor [ 26/Jan/23 ] |
|
TBD whether this goes in serverStatus or just in one of the FTDC collectors |