Summary
What is the problem or use case, what are we trying to achieve?
Particularly with the introduction of durable history, startup recovery performance has needed to investigated. It would be useful to be able to extract WT metrics while doing this recovery step to aid investigation.
Motivation
- Does this affect any team outside of WT?
(Are they blocked? Are they waiting for an answer?)
Yes, triage teams get tasked with understanding a problem and long pauses at MDB startup without progress reports have been tricky for them to understand and route to the right team. It's unclear if they can use this information to help better understand the problem. I expect this to primarily help WT engineers.
- How likely is it that this use case or problem will occur?
(Main path? Edge case? Frequency of the issue?)
The code path in question is startup and shutdown. Presumably when either: - startup is performing crash recovery
- the stable timestamp at shutdown has significantly lagged the most recent writes
- If the problem does occur, what are the consequences and how severe are they?
(A minor annoyance at a log message? Performance concern? Outage/unavailability? Test Failure?)
The problem of slow startup can manifest as unavailability.
- Is this issue urgent?
(Does this ticket have a required timeline? What is it?)
This issue of slow startup is often investigated when releasing a new MDB version. Given that has recently happened, this is not immediately urgent.
Acceptance Criteria (Definition of Done)
(When will this ticket be considered done? What is the acceptance criteria for this ticket to be closed?)
When WT metrics can be gathered during startup and shutdown. Whether this is done with the existing cursor APIs on the "stats:" table or by some other means.
- Testing
(What all testing needs to be done as part of this ticket? Unit? Functional? Performance?Testing at MongoDB side?)
Functional testing on the WT side.
- Documentation update
(Does this ticket require a change in the architecture guide? If yes, please create a corresponding doc ticket.)
This would presumably require new WT public documentation. But not necessarily anything for the architecture guide.
[Optional] Suggested Solution
(Is there any suggested solution to handle this issue? Is it related to any existing WT ticket? Is it related to any previous issue fixed? If yes, link the WT ticket number using related to, depends on, dependent on by links)
PR to be attached shortly
- causes
-
SERVER-69851 Align with the new WiredTiger event handler interface
- Closed
- is duplicated by
-
WT-9373 Allow MongoDB to access WT stats during open and shutdown
- Closed
- is related to
-
SERVER-70031 Ensure WT is open when generating WiredTiger statistics.
- Closed