[SERVER-82549] MongoDB 7.0.2 SBE query slow when direct index exists Created: 29/Oct/23 Updated: 17/Nov/23 Resolved: 07/Nov/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Johnny Shields | Assignee: | David Storch |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Query Execution
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | Here is an example query which executed in 1.173 seconds (we are seeing up to 10 seconds). Since there is a direct index on customer_user_id: 1, we would expect this to return immediately. Note that if you remove deleted_at: null and sort: _id: 1 from the below the result is essentially the same.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | QO 2023-11-13 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
(This ticket is one of several issues we at TableCheck had when upgrading from MongoDB 6.0.11 to 7.0.2; refer to https://support.mongodb.com/case/01207868) We have a collection Customers with a scalar ID field "customer_user_id" (foreign key). When we query with complex conditions on this field, we see queries taking up to 10 seconds, for example:
If we remove the franchise_id condition, the query returns immediately:
Note that both queries are ultimately selecting the same index (customer_user_id_1). Anecdotally we did not see the same issue on the "Classic" query engine. Also, if we hint the affected query with the index it ultimately selects, it executes immediately (executionTimeMillis = 1) |
| Comments |
| Comment by Jussi Kauppila [ 17/Nov/23 ] | ||
|
We have the exactly same issue. Query planner picks up the index for sort instead of the optimal index for the key value search.
| ||
| Comment by Johnny Shields [ 08/Nov/23 ] | ||
|
Hi David, thank you for this detailed explanation, it is useful. I've left a comment on SERVER-62150 to give an idea of the scope of impact for us, hopefully it is useful in helping to get it prioritized. | ||
| Comment by David Storch [ 07/Nov/23 ] | ||
|
Hi shields@tablecheck.com, thanks for providing this detailed report! The explain output was quite useful and allowed us to confirm that this is a manifestation of known issue SERVER-62150. Therefore, I am closing this issue as a duplicate of SERVER-62150 but you can read on for additional technical details. MongoDB's query optimizer has always chosen a data access path (i.e. which index to use) by partially executing a set of candidate plans, and then examining the resulting runtime statistics in order to choose a winning plan. This process is called multi-planning, and it exists for both for the Classic Execution Engine as well as Slot-Based Execution (SBE). However, due to some implementation details that I won't describe here, SBE has its own implementation of multi-planning which differs from the Classic Engine implementation of multi-planning in some key respects. SERVER-62150 is a flaw in the SBE multi-planning process which can cause the partial execution of candidate plans (known as the "trial period") to go on for too long. In the explain output you provided for the slow query, I can see in the allPlansExecution section that that there are 15 candidate plans generated. The winning plan does just a handful of reads from storage, reading two index keys and two documents. Amongst the remaining 14 rejected plans, 13 of them did exactly 10,000 reads from storage. 10,000 is the preconfigured limit for the maximum number of reads allowed during the SBE trial period. Each of these trial periods is reported in the explain to have taken anywhere from ~100ms to more than 1 second, which clearly explains the poor query performance you are experiencing. The problem goes away when you provide a hint and when you rewrite the query to only include the predicate on customer_user_id, since in either case the system ends up generating just one candidate plan and therefore does not need to go through the SBE multi-planner. Note that we have implemented a few improvements already around SBE multi-planning: see Hope that all makes sense, but let me know if you have any other follow-up questions! | ||
| Comment by Johnny Shields [ 29/Oct/23 ] | ||
|
I have attached 3 explain() outputs:
|