[SERVER-56486] SBE multi-planning trial period could run much longer than its equivalent in the classic engine Created: 29/Apr/21  Updated: 29/Oct/23  Resolved: 16/Jul/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: Mihai Andrei
Resolution: Fixed Votes: 0
Labels: sbe-post-v1, sbe-rollout
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0
Sprint: Query Execution 2021-05-17, Query Execution 2021-05-31, Query Execution 2021-06-14, Query Execution 2021-06-28, Query Execution 2021-07-12, Query Execution 2021-07-26
Participants:

 Description   

For context, here is a brief description of how the trial period in the classic engine works, assuming that we have two plans, A and B. Given a budget of "work units", usually set to something fairly large such as 10,000 works, we allow each plan to do a unit of work in a round robin fashion. That is, A performs one unit of work, then B performs one unit of work, then A performs a second work, B a second work, and so on. If at any point during this process one of the plans reaches EOF, then that plan becomes the winner and the trial period ends.

The SBE multi-planner is fundamentally different because there is no concept of "works". Instead, each plan is granted a similar budget of "reads". A read is defined here as a single operation on a storage engine cursor, such as an index seek or reading a single index key. The budget is calculated in the same way as for the classic engine, so it is something pretty large such as 10,000. Instead of the round-robin units of work, the SBE multi-planner will execute each plan until it either reaches its budget and "exits early", or reaches EOF. First, plan A executes until hitting its budget or EOF, and then plan B does the same.

Let's suppose that plan A reaches EOF with approximately 10 works/reads, but plan B requires 100,000. In a scenario like this, SBE multi-planning takes much longer than classic engine multi-planning to achieve the same result. The classic engine will work each plan 10 times, at which point A reaches EOF and the trial period ends. In contrast, SBE will first execute plan A. It performs 10 reads and finishes. Then, SBE will execute plan B. It executes 10,000 reads before it hits its budget and exits early. They key observation is that SBE has done orders or magnitude more work as part of the multi-planning trial period.

One way to mitigate this would be to progressively lower the budget of reads as each plan runs its trial period. However, this would only work if the best plan happens to run its trial period first. The issue would still occur if the bad plans run their trial period first, and the best plan runs its trial last.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 16/Jul/21 ]

Author:

{'name': 'Mihai Andrei', 'email': 'mihai.andrei@10gen.com', 'username': 'mtandrei'}

Message: SERVER-56486 Improve SBE multiplanner

This patch introduces a number of improvements to the SBE multiplanner.
In particular:

  • We no longer run plans in round-robin fashion (i.e. switch between
    plans until each has returned one result, and so on). Instead, each
    plan runs until it finishes its trial period.
  • The number of reads is progressively decreased. More precisely, the
    number of reads used by a plan during is trial period is used as the
    maximum number of reads for subsequent plans.
  • Non blocking plans are run before blocking ones. This is a heuristic
    that limits the amount of work done by a blocking plan as each blocking
    plan will now be bounded by the smallest number of reads used by a non
    blocking plan during its trial period.
    Branch: master
    https://github.com/mongodb/mongo/commit/e4d2f7606d0500028c8cb1389ac86f1a93cecad1
Comment by Githook User [ 01/Jul/21 ]

Author:

{'name': 'Mihai Andrei', 'email': 'mihai.andrei@10gen.com', 'username': 'mtandrei'}

Message: Revert "SERVER-56486 [SBE] Run non blocking plans first in multiplanner to set an upper bounds for blocking plans"

This reverts commit 3645e5b8f5367a1c543d73c441c1364eeac2d783.
Branch: master
https://github.com/mongodb/mongo/commit/951dfeb5aca55cc881293372dfda4b72bb478b92

Comment by Githook User [ 01/Jul/21 ]

Author:

{'name': 'Mihai Andrei', 'email': 'mihai.andrei@10gen.com', 'username': 'mtandrei'}

Message: SERVER-56486 [SBE] Run non blocking plans first in multiplanner to set an upper bounds for blocking plans
Branch: master
https://github.com/mongodb/mongo/commit/3645e5b8f5367a1c543d73c441c1364eeac2d783

Generated at Thu Feb 08 05:39:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.