[SERVER-25042] Start diagnostic data collection as early as possible Created: 13/Jul/16  Updated: 07/Feb/23  Resolved: 11/Oct/21

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: None
Fix Version/s: 5.1.0-rc1

Type: Improvement Priority: Critical - P2
Reporter: Bruce Lucas (Inactive) Assignee: Sara Golemon
Resolution: Done Votes: 6
Labels: SWDI, move-sec, platforms-re-triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Documented
Problem/Incident
Related
related to WT-9373 Allow MongoDB to access WT stats duri... Closed
is related to SERVER-48221 Shut down ftdc after storage engine Closed
Backwards Compatibility: Minor Change
Backport Requested:
v5.0
Sprint: Security 2021-09-06, Security 2021-09-20, Security 2021-10-04, Security 2021-10-18
Participants:
Case:
Linked BF Score: 133

 Description   

Currently diagnostic data collection does not begin until fairly late in the startup sequence, after some potentially significant activity such as oplog stones and interrupted index builds have been done, so it can't be used to diagnose problems early in the startup sequence. Ideally it should be started as early as possible; one dependency is storage engine initialization since it collects WT internal statistics that won't be available until the storage engine is initialized, so maybe it could go right after storage engine initialization.



 Comments   
Comment by Yujin Kang Park [ 12/Jul/22 ]

Requesting backport to v5.0 given that we have TSAN failures (BF-25790) due to race condition accessing flow control stats. If backporting all changes is risky, only the exclusion from TSAN should be backported.

Comment by Eric Milkie [ 08/Nov/21 ]

Eric Milkie, now that this is complete, is there a WT ticket we should file?

I would first determine whether we have sufficient FTDC stats already, while RTS is running. Sara, is there example FTDC output when server startup takes a long time due to a lengthy RTS phase?
(We could test this by artificially pinning the stable timestamp from moving forward, performing lots of writes, and then restarting mongod).

Comment by Githook User [ 12/Oct/21 ]

Author:

{'name': 'Sara Golemon', 'email': 'sara.golemon@mongodb.com', 'username': 'sgolemon'}

Message: Revert "SERVER-25042 Start up FTDC earlier during server startup"

Partial revert of mongos portions of SERVER-25042
Branch: master
https://github.com/mongodb/mongo/commit/714523009107af8aebc7f142518c73a5c61c33b3

Comment by Githook User [ 11/Oct/21 ]

Author:

{'name': 'Sara Golemon', 'email': 'sara.golemon@mongodb.com', 'username': 'sgolemon'}

Message: SERVER-25042 Start up FTDC earlier during server startup
Branch: master
https://github.com/mongodb/mongo/commit/16839edee101c1b36b6a3d1b6e33a5ec3400c40d

Comment by Githook User [ 13/Sep/21 ]

Author:

{'name': 'Sara Golemon', 'email': 'sara.golemon@mongodb.com', 'username': 'sgolemon'}

Message: Revert "SERVER-25042 Start up FTDC earlier during server startup"

This reverts commit b7d29c204f0a4b62fc1d9bccc3ec341bbeed330c.
Branch: master
https://github.com/mongodb/mongo/commit/d3064330ba2eac458cff9ecc7f42642e868becff

Comment by Sara Golemon [ 13/Sep/21 ]

Reopening due to revert.

Comment by Githook User [ 10/Sep/21 ]

Author:

{'name': 'Sara Golemon', 'email': 'sara.golemon@mongodb.com', 'username': 'sgolemon'}

Message: SERVER-25042 Start up FTDC earlier during server startup
Branch: master
https://github.com/mongodb/mongo/commit/b7d29c204f0a4b62fc1d9bccc3ec341bbeed330c

Comment by Ian Whalen (Inactive) [ 31/Aug/20 ]

this also is something we need to come back to after we resolve SERVER-48221.

Generated at Thu Feb 08 04:08:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.