Addition of new multiplanner histograms and agg metrics in serverStatus and agg-only in FTDC
SERVER-62150 describes a scenario where SBE multi-planning can be slow relative to the classic engine's multi-planning implementation. We implemented
SERVER-62981 in order to mitigate this issue, and also have proposed SERVER-63641 as an additional improvement. In order to make sure that customers are experiencing good SBE multi-planner performance, we should add metrics to serverStatus. Before implementing this ticket, we need to agree on exactly what metrics to capture and how they will be exposed in serverStatus. The current proposal is to collect histograms of both the number of storage reads performed during SBE multi-planning and the overall wall clock time spent multi-planning.
We may wish to collect similar information for the classic multi-planner as well as the SBE multi-planner. There are known scenarios in which the classic multi-planner can take a long time to complete. In particular, see SERVER-31078.
The intended audience of these metrics is query engineering and query product management. We want to be able to analyze the performance of multi-planning across the Atlas fleet in order to inform our decision making about future improvements to the server. It's probable that these metrics would also be useful in support scenarios (e.g. seeing if a customer is getting a lot of queries which take a long time to multi-plan), but this is not the primary use case.