[SERVER-54472] Collect additional FTDC metrics during shutdown for replica sets Created: 11/Feb/21 Updated: 29/Oct/23 Resolved: 31/Mar/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.0-rc0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Gregory Wlodarek |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Sprint: | Execution Team 2021-03-22, Execution Team 2021-04-05 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
In There is some additional system-related information in serverStatus that would be useful to collect, such as tcmalloc and extra_info. This would allow us to diagnose CPU and memory problems during shutdown, and to monitor memory usage including shutdown in our performance tests. It should be possible to collect this information since it is system-related and not (logically) dependent on MongoDB-specific data structures. |
| Comments |
| Comment by Githook User [ 31/Mar/21 ] |
|
Author: {'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}Message: |
| Comment by Gregory Wlodarek [ 30/Mar/21 ] |
|
I've found the issue. I tested my changes against a standalone node only and not a replica set where there's a minor difference when shutting down. I'll use this ticket to fix that. I attached sample metrics with the fix. |
| Comment by Bruce Lucas (Inactive) [ 25/Feb/21 ] |
|
Maybe, but it looks to me like the presence of the ShutdownInProgress error as the (entire) content of serverStatus during this time suggests to me that the problem doesn't originate in WT per se. |
| Comment by Gregory Wlodarek [ 25/Feb/21 ] |
|
bruce.lucas sorry for getting back to you late. This is an interesting case, I wonder if during the WT shutdown something is preventing serverStatus from running properly. I think this is worth investigating for us. |
| Comment by Bruce Lucas (Inactive) [ 19/Feb/21 ] |
|
The content of serverStatus seen in FTDC during shutdown in the above case consists of just the following fields: ok: 0 |
| Comment by Bruce Lucas (Inactive) [ 18/Feb/21 ] |
|
gregory.wlodarek if so then I may have misinterpreted what I was seeing in
A marks the point where we start shutting down WT. The system memory metrics (from the systemMetrics section of ftdc) shows the increasing memory that WT is using during shutdown, but the tcmalloc and resident memory stats (from serverStatus) are showing as 0. This is a little different from your experiment - the delay here is in WT shutdown, whereas in your experiment you introduced a delay after shutdown. I wonder if it's possible that behavior during WT shutdown is what's problematic. In case it matters, the above is on build 4.9.0-alpha-1214-g54e8728. |
| Comment by Gregory Wlodarek [ 18/Feb/21 ] |
|
bruce.lucas I attached screenshots from t2 of metrics we continue to collect after the storage engine shutdown. In the screenshots, the server was idle for a minute then I hung shutdown after the storage engine shutdown for another minute. Both tcmalloc and extra_info continue to be collected. Am I missing something? |
| Comment by Bruce Lucas (Inactive) [ 11/Feb/21 ] |
|
|