[SERVER-63015] Capture metrics about time spent multiplanning Created: 26/Jan/22 Updated: 28/Jul/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Diagnostics |
| Affects Version/s: | 5.2.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Kevin Arhelger | Assignee: | Backlog - Query Integration |
| Resolution: | Unresolved | Votes: | 2 |
| Labels: | query-offsite | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Query Integration
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
This request is for an addition to the slow query message. Having a metric in server status reflecting the time spent planning queries would give a strong signal if there is something wrong. Perhaps a metric like "timeAcquiringMicros" for the locking statistics would be helpful to determine how efficient planning is. Reporting the time spent planning a query in the slow query message would greatly assist query diagnostics. Currently a query taking excessive time planning will not reflect in any of the current metrics, hiding any potential impact from planning delays. |
| Comments |
| Comment by Charlie Swanson [ 24/Mar/23 ] | |||||||||||||||
|
Moving this out of the current telemetry epic which has been reduced in scope. This work still seems relevant and I think should be tackled in our planned follow up project, so I moved it there. Note that the scope hasn't been drafted/approved yet, so I'm not 100% sure it will include planning time as a telemetry metric. | |||||||||||||||
| Comment by Kevin Arhelger [ 31/Jan/22 ] | |||||||||||||||
|
Here is the general idea.
serverStatus example:
Slow query example:
Nice to have:
serverStatus example:
A delta between these two counters would allow us to determine how many planning events are currently in-flight, and would allow easier diagnostics for abnormally long planning phases. Nice to have but likely difficult:
This would help with a plan that is running but does not finish before the server terminates (SIGKILL, SIGSEGV, pulling the power plug). There are a few things that aren’t included here that could be separate requests:
Please let me know if there is any additional color I can add. | |||||||||||||||
| Comment by Ana Meza [ 31/Jan/22 ] | |||||||||||||||
|
Hi kevin.arhelger could you please add some details about the solution are looking for? | |||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 26/Jan/22 ] | |||||||||||||||
|
This sounds useful. The serverStatus/FTDC metric should follow the usual design pattern and be a cumulative time. |