Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 5.0.2
Affects Version/s: None
Component/s: None
Labels:
- sbe-post-v1
- sbe-rollout

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Steps To Reproduce:
Hide

The following script reproduces the bug if run against a server with SBE enabled:

(function() { "use strict"; const coll = db.c; coll.drop(); assert.commandWorked(coll.createIndex({a: 1})); assert.commandWorked(coll.createIndex({b: 1})); // Configure the server such that the SBE trial period should end after doing 10 reads from storage. assert.commandWorked(db.adminCommand({setParameter: 1, internalQueryPlanEvaluationWorks: 10})); for (let i = 0; i < 20; ++i) { assert.commandWorked(coll.insert({a: 1, b: 1, c: i})); } const explain = coll.find({a: 1, b: 1}).sort({c: 1}).explain("allPlansExecution"); // Since there are 20 documents, we expect that the full execution plan reports at least 20 keys // examined and 20 docs examined. // // It is possible that more than 20 keys/docs are examined, because we expect the plan to be closed // and re-opened during the trial period (which does not clear the execution stats from the first // open). assert.gte(explain.executionStats.totalKeysExamined, 20); assert.gte(explain.executionStats.totalDocsExamined, 20); // Extract the first plan in the "allPlansExecution" array. The winning plan is always reported // first. const winningPlanTrialPeriodStats = explain.executionStats.allPlansExecution[0]; // The number of keys examined and docs examined should both be less than 10, since we configured // the trial period for each candidate plan to end after 10 storage reads. assert.lte(winningPlanTrialPeriodStats.totalKeysExamined, 10); assert.lte(winningPlanTrialPeriodStats.totalDocsExamined, 10); }());

The final two assertions in the script fail due to the bug.
Show
The following script reproduces the bug if run against a server with SBE enabled: (function() { "use strict" ; const coll = db.c; coll.drop(); assert .commandWorked(coll.createIndex({a: 1})); assert .commandWorked(coll.createIndex({b: 1})); // Configure the server such that the SBE trial period should end after doing 10 reads from storage. assert .commandWorked(db.adminCommand({setParameter: 1, internalQueryPlanEvaluationWorks: 10})); for (let i = 0; i < 20; ++i) { assert .commandWorked(coll.insert({a: 1, b: 1, c: i})); } const explain = coll.find({a: 1, b: 1}).sort({c: 1}).explain( "allPlansExecution" ); // Since there are 20 documents, we expect that the full execution plan reports at least 20 keys // examined and 20 docs examined. // // It is possible that more than 20 keys/docs are examined, because we expect the plan to be closed // and re-opened during the trial period (which does not clear the execution stats from the first // open). assert .gte(explain.executionStats.totalKeysExamined, 20); assert .gte(explain.executionStats.totalDocsExamined, 20); // Extract the first plan in the "allPlansExecution" array. The winning plan is always reported // first. const winningPlanTrialPeriodStats = explain.executionStats.allPlansExecution[0]; // The number of keys examined and docs examined should both be less than 10, since we configured // the trial period for each candidate plan to end after 10 storage reads. assert .lte(winningPlanTrialPeriodStats.totalKeysExamined, 10); assert .lte(winningPlanTrialPeriodStats.totalDocsExamined, 10); }()); The final two assertions in the script fail due to the bug.
Sprint:
Query Execution 2021-07-12, Query Execution 2021-07-26
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

When explain runs with the "allPlansExecution" verbosity, it is supposed to report runtime statistics from each of the candidate plans collected during the multi-planning trial period. To provide some quick background information: MongoDB currently selects a winning plan by generating a set of possible candidate plans and partially executing these candidate plans for a so-called "trial period". Statistics are collected during this trial period, and used to score each plan.

The "allPlansExecution" section in explain is designed to report the runtime statistics collected during this trial period. It is frequently used by our technical support team in order to help customers understand why a particular plan was chosen. For customer support, it is essential that we have sufficient diagnostic information to determine why a particular plan was chosen over another. The "allPlansExecution" output is the main tool at our disposal for doing so.

The problem is that when SBE is enabled, we can sometimes incorrectly report the stats for the winning plan in the "allPlansExecution" section. In particular, this can happen when the winning plan has a SORT stage. In order to ensure that the trial period does not run for too long when there is a sort stage, we have logic in sbe::SortStage to throw a special QueryTrialRunCompleted exception after the candidate plan has already performed a certain amount of work. This exception is caught by the multi-planning code, and at this point the SBE tree should contain statistics describing the trial period accurately.

However, before the statistics are serialized to BSON and added to the "allPlansExecution" section, the SBE multi-planner will close and re-open the winning plan. This time, opening the plan tree will not exit early with a QueryTrialRunCompleted exception (because the TrialRunTracker pointer has been cleared). As a result, the plan may run for much longer than it did during the actual trial period. In turn, the stats reported in the "allPlansExecution" will not reflect the actual trial period and will be more or less useless for understanding why this plan was chosen as the winner. See "Steps to Reproduce" for an example.

is related to

SERVER-57513 SBE multi-planning can incorrectly favor a blocking SORT plan over a non-blocking plan

Closed

Assignee:: Mohammad Dashti (Inactive)
Reporter:: David Storch
Participants:: David Storch, Githook User, Mohammad Dashti
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Jun 16 2021 10:34:46 PM UTC
Updated:: Oct 29 2023 09:52:03 PM UTC
Resolved:: Jul 23 2021 08:37:03 PM UTC
Confidence Status Last Update:: 01/Jul/21 12:30 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates