-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Fully Compatible
-
ALL
-
-
Query Execution 2021-07-12, Query Execution 2021-07-26
When explain runs with the "allPlansExecution" verbosity, it is supposed to report runtime statistics from each of the candidate plans collected during the multi-planning trial period. To provide some quick background information: MongoDB currently selects a winning plan by generating a set of possible candidate plans and partially executing these candidate plans for a so-called "trial period". Statistics are collected during this trial period, and used to score each plan.
The "allPlansExecution" section in explain is designed to report the runtime statistics collected during this trial period. It is frequently used by our technical support team in order to help customers understand why a particular plan was chosen. For customer support, it is essential that we have sufficient diagnostic information to determine why a particular plan was chosen over another. The "allPlansExecution" output is the main tool at our disposal for doing so.
The problem is that when SBE is enabled, we can sometimes incorrectly report the stats for the winning plan in the "allPlansExecution" section. In particular, this can happen when the winning plan has a SORT stage. In order to ensure that the trial period does not run for too long when there is a sort stage, we have logic in sbe::SortStage to throw a special QueryTrialRunCompleted exception after the candidate plan has already performed a certain amount of work. This exception is caught by the multi-planning code, and at this point the SBE tree should contain statistics describing the trial period accurately.
However, before the statistics are serialized to BSON and added to the "allPlansExecution" section, the SBE multi-planner will close and re-open the winning plan. This time, opening the plan tree will not exit early with a QueryTrialRunCompleted exception (because the TrialRunTracker pointer has been cleared). As a result, the plan may run for much longer than it did during the actual trial period. In turn, the stats reported in the "allPlansExecution" will not reflect the actual trial period and will be more or less useless for understanding why this plan was chosen as the winner. See "Steps to Reproduce" for an example.
- is related to
-
SERVER-57513 SBE multi-planning can incorrectly favor a blocking SORT plan over a non-blocking plan
- Closed