[SERVER-82217] Filter query with sort performs much worse with SBE Created: 16/Oct/23  Updated: 30/Nov/23  Resolved: 28/Nov/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 7.0.2
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Philipp Boe Assignee: Ivan Fefer
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File query-classic-explain.txt     Text File query-sbe-explain-wihout-sort.txt     Text File query-sbe-explain.txt     File repro.js    
Issue Links:
Duplicate
duplicates SERVER-83196 SBE multiplanning may chooses the wro... Open
Assigned Teams:
Query Execution
Operating System: ALL
Sprint: QE 2023-11-27, QE 2023-12-11
Participants:

 Description   

We have recently upgraded from 6.0.4 to Mongo 7.0.2 and noticed a significant drop in query performance for certain filter queries using a sort.

For example, this query takes 13 seconds with SBE:

 

rs0 [direct: secondary] mydb> db.version();
7.0.2
rs0 [direct: secondary] mydb> db.mycoll.find({"fsType": "PageRef","page.formData.pt_title.value": "a"}).sort({ _id: -1 }).explain('executionStats').executionStats.executionTimeMillis;
13246 

 

With the classic engine it takes 1.5 seconds:

rs0 [direct: secondary] mydb> db.adminCommand({ setParameter: 1, internalQueryFrameworkControl: 'forceClassicEngine'})
{
  was: 'forceClassicEngine',
  ok: 1,
  ..
}
rs0 [direct: secondary] mydb> db.mycoll.find({"fsType": "PageRef","page.formData.pt_title.value": "a"}).sort({ _id: -1 }).explain('executionStats').executionStats.executionTimeMillis;
1554 

The collection contains around 600K documents and there is a compound index that contains the "fsType" field (as the first key).

It looks like with SBE the query doesn't use the index because it seems to be processing all 600k documents. I've attached the explain outputs for the queries as files.

Without the sort param SBE performs well (and seems to use the index):

 

rs0 [direct: secondary] mydb> db.mycoll.find({"fsType": "PageRef","page.formData.pt_title.value": "a"}).explain('executionStats').executionStats.executionTimeMillis;
276

 



 Comments   
Comment by Ivan Fefer [ 30/Nov/23 ]

No, there is no stable API for switching to classic, as the exact query engine used to process the query is a deep implementation detail and we are working to make the transition to SBE as invisible as possible. I understand that in that case we failed to do so.

In general, our advice is not to set internalQueryFrameworkControl without instructions from our customer support.

We have a plan to address the issue of SBE multiplanning in the next few patch releases for 7.0.

With regards to continue using classic engine right now, your options are:

  1. Set internalQueryFrameworkControl to forceClassicEngine.
    I would recommend removing this option after you upgrade to version >= 7.0.6.
  2. Postpone 7.0 upgrade until 7.0.6.
  3. Implement other suggestions and keep using default value of internalQueryFrameworkControl. I understand that these might require changes in your applications.

 

Comment by Philipp Boe [ 30/Nov/23 ]

Thanks for the info. I'll give these options a try.

I do have a question regarding:

If this doesn't help than switching back to Classic engine should.

Is there a public/stable API for switching back to the classic engine? We've been using the internalQueryFrameworkControl parameter for that, but as edwin.zhou@mongodb.com pointed out this is an internal parameter.

Comment by Ivan Fefer [ 29/Nov/23 ]

philipp.boersteken@crownpeak.com 

In the meantime, what can you do to in your case is:
1. Clear plan cache and run the query with a real value for "page.formData.pt_title.value" field. This will allow multi planner to correctly estimate the usefulness of index of "fsType".
2. Create an index on {fsType: 1, page.formData.pt_title.value: 1}.
3. Or even create an index on {fsType: 1, page.formData.pt_title.value: 1, _id: -1} - this way sort won't be needed at all.
4. If this doesn't help than switching back to Classic engine should. 

Comment by Ivan Fefer [ 28/Nov/23 ]

I have attached repro.js- js script that reproduces the problem.

Comment by Ivan Fefer [ 28/Nov/23 ]

philipp.boersteken@crownpeak.com Thanks again for taking time to report this and giving us the practical example of this problem.

This situation is a part of bigger issue with multi planning design for SBE that we are actively working on.

Unfortunately, there is no quick fix for this problem, but more systematic solution is being created as I type this.

I am going to close this ticket as a duplicate of SERVER-83196.

 

Comment by Ivan Fefer [ 28/Nov/23 ]

I investigated some more.

Here are my findings:

  1. In the provided example, non-existing "a" string is used in a predicate for "page.formData.pt_title.value" field. That confuses the multi planner, as any plan will return 0 documents.
  2. In the absence of "productivity" estimation, we are left with 2 things: heuristic bonuses and EOF bonus.
  3. EOF bonus just makes sure that the plan is selected as best if it was able to complete in trial time.
  4. There are several small heuristic bonuses, like noFetchBonus or noSortBonus that are basically tie breakers in cases when productivity is roughly the same. 

Now let's examine what's happening in Classic engine:

  1. When using classic engine multi planner can call all plans (even "blocking" ones with sort) in round robin fashion. This allows us to set pretty high limits on the total number of calls to work() function during trial period: the default is currently 0.3 * collection size.
  2. Because of this, (IXSCAN fsType + SORT) plan reaches EOF, as fsType predicate is selective enough to leave less than 30% of the collection to check.
  3. As this plan reaches EOF in trial time, it is selected as "correct" and cached.

What's happening in SBE engine:

  1. In SBE there is no NEED_TIME result for plan execution, so blocking stages like sort or group will try to consume all of the input on the first call. There is no graceful way to interrupt this process.
  2. So multi planner in SBE has to set much smaller limits on trial period. Currently it is allowed to read around 10000 documents. We can't afford to let an inefficient plan to read 30% of a whole collection like we do in Classic.
  3. Because of this, no plan reaches EOF during trial in SBE. 
  4. Because productivity of both plans are equally 0, the plan without sort wins, as it gets noSortBonus.

 

Comment by Ivan Fefer [ 27/Nov/23 ]

I also see that there are no returned documents at all (because of "a" as a search value).
So multi planner have no data to compare both plans, as they both return 0 documents, so it picks non-blocking plan (we need to re-think this choice).
It should behave better if a real value is searched for (plan cache might need to be cleared).

Comment by Ivan Fefer [ 27/Nov/23 ]

How does the classic multi planner handles blocking stages?

Comment by Edwin Zhou [ 27/Oct/23 ]

Hi philipp.boersteken@crownpeak.com,

Thank you for your detailed ticket description and for providing the explain plans. Since "internalQueryFrameworkControl" is an internal parameter, we have chosen not to document this at this time.

I will pass this over to our Query team to evaluate the execution of these queries.

Kind regards,
Edwin

Comment by Philipp Boe [ 16/Oct/23 ]

On a side note: It took us quite some time to figure out how to force the classic query engine because we couldn't find any documentation. Are there any plans to document the internalQueryFrameworkControl parameter (or any other public API variant of that parameter)?

Generated at Thu Feb 08 06:48:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.