[SERVER-62150] SBE Multiplanning can be slow when suboptimal plan runs first Created: 17/Dec/21  Updated: 27/Jan/24

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: 5.1.1, 5.2.0-rc1, 6.0.12, 7.0.4
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Mihai Andrei Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 3
Labels: RDY
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-82549 MongoDB 7.0.2 SBE query slow when dir... Closed
Problem/Incident
Related
related to SERVER-82677 Deduplicate index scan + fetch plans ... In Code Review
related to SERVER-62981 Make SBE multi-planner's trial period... Closed
related to SERVER-63102 Make separate internalQueryPlanEvalua... Closed
related to SERVER-63642 Add serverStatus metrics to measure m... Closed
related to SERVER-63641 Improve SBE multi-planning by choosin... Closed
is related to SERVER-83196 SBE multiplanning may chooses the wro... Open
Assigned Teams:
Query Execution
Operating System: ALL
Sprint: QE 2022-01-24, QO 2023-11-13, QO 2023-11-27
Participants:
Case:

 Description   

Currently, the strategy used in SBE multiplanning is as follows:

  • We run non blocking plans before blocking ones.
  • We run each plan’s trial period to completion (i.e. until we return 101 documents or we use up the plans budget). We use the number of reads performed by said plan to bound the number of reads used by any remaining plans.

The problem with this approach is that if the first plan we run is not the optimal one, we are stuck running it and we can potentially use all of the reads. As an example, consider two plans, A and B. Plan A needs to perform 10k storage engine reads to get 101 documents, while plan B needs to perform 101 reads to get 101 documents. If Plan B runs first, we have no problems: we will set the reads limit for plan A to 101, and it will stop running after 101 reads. If Plan A runs first however, we will be stuck running plan A for all 10k reads.  Though we’ll eventually run plan B and it will be chosen, this negatively impacts the performance of queries which need to use the multiplanner.



 Comments   
Comment by Johnny Shields [ 01/Dec/23 ]

David, Ivan thank you both it means a lot to me as a customer that MongoDB is tackling these issues with high priority.

Comment by David Storch [ 01/Dec/23 ]

To add onto what Ivan said, I'm going to move this ticket back to the "Open" state – we are no longer "Investigating" this issue, but rather are executing on an engineering project to fix the problem. The solution will be delivered against a sequence of related Jira tickets rather than developing directly against this ticket. However, we can provide high-level progress updates here.

Comment by Ivan Fefer [ 30/Nov/23 ]

Yes, I am aware.

We are working on a solution. However, it requires redesign of the whole multi planning process with SBE and will take some time to develop, test and release.

To improve customer experience in the meantime, we are planning a change in default configuration via SERVER-83470.

As well we are going to improve our testing process not to miss this again with the next SBE release.

Comment by Johnny Shields [ 30/Nov/23 ]

ivan.fefer@mongodb.com are you also aware of this issue? This is another one we are seeing related to SBE.

Comment by Johnny Shields [ 08/Nov/23 ]

Following from my report in SERVER-82549, I'd like to underscore how app-breaking and frustrating this issue is in MongoDB 7. (We didn't see any effects in Mongo 5 or 6.)

I anticipate many CRUD apps on MongoDB will be affected by this. When we initially upgraded to MongoDB 7 ~3 weeks ago we saw it breaking our app in a number of critical places. Please give this issue the attention it deserves.

Comment by Ana Meza [ 12/Apr/22 ]

Waiting on other tickets first

Comment by David Storch [ 07/Apr/22 ]

Returning this to the triage queue. At the moment our efforts related to this problem fall under SERVER-63642 and SERVER-63641, so there is no action for me currently planned against this umbrella ticket.

Comment by David Storch [ 14/Feb/22 ]

Another quick update. We have filed two additional offshoot tickets:

  • SERVER-63642 "Add serverStatus metrics to measure multi-planning performance". The work for this ticket would add telemetry to help us to understand the performance of the SBE multi-planner across the Atlas fleet.
  • SERVER-63641 "Improve SBE multi-planning by choosing which plan to work next based on a priority metric". This ticket tracks the improvement to the SBE multi-planning algorithm proposed by mihai.andrei which I have already summarized above. The ticket description contains a more detailed writeup of the proposed change. This work could help to improve the performance of SBE multi-planning beyond what was already achieved in SERVER-62981.

Folks interested in this ticket may wish to watch these two new related ones. This ticket will continue to serve as the umbrella. There is no specific engineering work planned against the umbrella ticket at this time, but SERVER-63642 is scheduled and SERVER-63641 will be triaged by the Query Execution team.

Comment by David Storch [ 28/Jan/22 ]

Related ticket SERVER-62981 has now been completed for versions 5.3.0 and 5.2.1 which we anticipate will help a lot with the problem described by this ticket.

Comment by David Storch [ 25/Jan/22 ]

The Query Team has been internally brainstorming several potential solutions to this problem. We have generated a handful of ideas of various implementation complexity which I will describe below, mostly for the benefit of query engineering. However, we think there is one simple change that we should implement immediately, which we expect should go a long way towards mitigating the problem described here: SERVER-62981. I suggest that folks interested in this ticket also watch SERVER-62981.

Once SERVER-62981 is complete, we could consider pursuing one of the following additional changes in the future:

  • mihai.andrei's idea: during SBE multi-planning, always call getNext() on whichever plan currently seems the most promising. This shouldn't be too hard to implement, but it still suffers from the problem where a single call to getNext() for an unselective plan could expend the entire reads budget.
  • christopher.harris points out that we could use the classic multi-planner for plan selection, but then hand the winning plan off to the SBE engine for execution. In this scheme, we would continue to use SBE when recovering plans from the plan cache. One downside of this approach is that any partial results computed during the trial period would have to be completely thrown out, and execution started from the beginning in SBE. It would also be quite complex to implement. However, it would mean that enabling SBE would not change the behavior of the system with regards to plan selection. It would also give us an opportunity to restore some of the more useful aspects of the explain format at "allPlansExecution" verbosity.
  • I propose a two-phase approach in the SBE multi-planner. The first round would run the trial plan for each candidate much like the SBE multi-planner's current process, but it would use a much smaller reads budget. For instance, this reads budget could be on the order of 100 or 200. If any plan hits EOF or produces its first batch of results, then a winner is chosen according to the current ranking formula. Otherwise, we move onto the second round using the much larger reads budget of 10,000. The idea is to make sure that multi-planning terminates as quickly as possible without exploring bad candidate plans when one of the available candidates is very cheap.
Generated at Thu Feb 08 05:54:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.