[SERVER-27692] Can diagnostic data capture be stopped later in the shutdown process? Created: 16/Jan/17  Updated: 27/Oct/23  Resolved: 11/Sep/23

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Joanna Cheng Assignee: Backlog - Security Team
Resolution: Gone away Votes: 4
Labels: SWDI, move-sec
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-48221 Shut down ftdc after storage engine Closed
is related to WT-3223 Output progress messages during long ... Closed
Assigned Teams:
Server Security
Participants:

 Description   

Sometimes shutdown takes a long time; in these cases it's possible for the diagnostic data collection to be stopped well before the mongod actually stops. This can make it hard to figure out what's happening during the latter stages of the slow shutdown.

If possible, can we make diagnostic data collection stop later in the shutdown process?



 Comments   
Comment by Alexander Gorrod [ 16/Mar/17 ]

I've opened WT-3223 to add progress messaging to WiredTiger checkpoints - since that is a different issue to continuing to capture diagnostic data during the period.

Comment by John Murphy [ 16/Mar/17 ]

Every 30 seconds would be my vote, however if it's a substantial amount of data being logged 1 minute seems ideal.

Comment by Alexander Gorrod [ 16/Mar/17 ]

Would once per minute be a reasonable cadence for logging the additional information?

Comment by John Murphy [ 16/Mar/17 ]

If WiredTiger was able to log progress during the final checkpoint (and potentially other parts of the WiredTiger engine shutdown) this would provide very useful information when troubleshooting shutdown issues.

Comment by Alexander Gorrod [ 16/Jan/17 ]

joanna.cheng The most common reason for shutdown to take a long time, is that the WiredTiger storage engine creates a new checkpoint on shutdown. Creating this checkpoint facilitates fast subsequent startup times, and reduces the amount of disk space required because it allows old journal files to be removed. If the user is running a workload that is IO bound it can take a long time to create the final checkpoint.

It is likely that any diagnostic information capture would be unable to get further information from WiredTiger during this shutdown period, so continuing to capture diagnostic information is unlikely to be useful.

Generated at Thu Feb 08 04:15:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.