[SERVER-45255] Capture Pressure Stall Information in FTDC for Linux hosts Created: 19/Dec/19 Updated: 29/Nov/23 Resolved: 28/Apr/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.1.0-rc0, 7.0.0-rc6, 6.0.8 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Kevin Arhelger | Assignee: | Adrian Gonzalez Montemayor |
| Resolution: | Fixed | Votes: | 3 |
| Labels: | RDY, former-quick-wins | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Server Security
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v7.0, v6.3, v6.0
|
||||||||||||||||||||||||||||||||||||
| Sprint: | Security 2023-05-01 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||
| Description |
|
In newer kernels (RHEL 8.1) system wide Pressure Stall information is available in /proc/pressure. On systems that support it, this addition could be a valuable to more quickly spot system level issues. https://www.kernel.org/doc/html/latest/accounting/psi.html |
| Comments |
| Comment by Githook User [ 28/Jun/23 ] |
|
Author: {'name': 'Adrian Gonzalez', 'email': 'adriangonzalezmontemayor@gmail.com', 'username': 'adriangzz'}Message: |
| Comment by Githook User [ 22/Jun/23 ] |
|
Author: {'name': 'Adrian Gonzalez', 'email': 'adriangonzalezmontemayor@gmail.com', 'username': 'adriangzz'}Message: (cherry picked from commit 136235a05516e7f2d56dc4eefa3ffb1ee04dee5b) |
| Comment by Rachelle Palmer [ 20/Jun/23 ] |
|
Requesting backport for 6.0 series, thank you! |
| Comment by Githook User [ 28/Apr/23 ] |
|
Author: {'name': 'Adrian Gonzalez', 'email': 'adriangonzalezmontemayor@gmail.com', 'username': 'adriangzz'}Message: |
| Comment by Ger Hartnett [ 11/Jan/23 ] |
|
Atlas Graviton is now running on AL2 with a kernel of 5.10+ |
| Comment by Mark Benvenuto [ 21/May/21 ] |
|
Pressure Stall Information is not available in Amazon Linux 2. AL2 uses 4.14 but PSI was added in 4.2.20. |
| Comment by Mark Benvenuto [ 16/Jan/20 ] |
|
While RHEL 8.1 has PSI, it is not on by default. There is a kernel config setting CONFIG_PSI_DEFAULT_DISABLED. On RHEL 8.1, it is set to "y" which means PSI is disabled by default. In order to enable it, a customer has to edit their grub config. References: |
| Comment by Mark Benvenuto [ 15/Jan/20 ] |
|
PSI support was added to Linux 4.20. The polling interface was added in 5.2. Redhat backported to their Linux 4.2.18 kernel in RHEL 8.1 as part of RHBZ# 1678388. Also, the only OS that we commercially supports that includes this is RHEL 8.1 (Ubuntu 18.04 is too old). The forthcoming Ubuntu 20.04 should have support for this though (they are testing on 5.4 in launchpad). We can get up to 500ms window size accuracy by using the poll() interface. This is better than the 10sec granularity provided by default when a file is read. If we decide to add support, we should use the poll interface() in a dedicated thread. I am not sure what thresholds to use (should we look for as low as 50ms stalls?). Our dedicated thread would then set counters to indicate that stalls occur, the type (cpu, memory, io) and the affect (some vs full). In my ad-hoc testing, I could not get it working though on a RHEL 8.1 machine in EC2 I had upgraded from RHEL 8. I was getting Operation not supported on read and write to the files under /proc/pressure. I was able to successfully test it on Fedora 31 with 5.3.7 though. References: |