[SERVER-67105] $in queries do not use clustered index Created: 08/Jun/22 Updated: 29/Oct/23 Resolved: 22/Mar/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 5.0.0, 6.0.0 |
| Fix Version/s: | 7.0.0-rc0, 6.0.6, 6.3.2 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Hana Pearlman |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | query-product-scope-1, query-product-urgency-2, query-product-value-1 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Backport Requested: |
v6.3, v6.0, v5.0
|
||||||||||||||||||||
| Sprint: | QO 2023-03-20, QO 2023-04-03 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||
| Description |
|
A query that uses $in on _id does not plan against a clustered index if one exists. If no suitable secondary index exists, we fall back to a collection scan. This affects queries directly on time-series buckets collections.
|
| Comments |
| Comment by Githook User [ 18/Apr/23 ] | ||||||||||||||||||
|
Author: {'name': 'Hana Pearlman', 'email': 'hana.pearlman@mongodb.com', 'username': 'HanaPearlman'}Message: | ||||||||||||||||||
| Comment by Githook User [ 04/Apr/23 ] | ||||||||||||||||||
|
Author: {'name': 'Hana Pearlman', 'email': 'hana.pearlman@mongodb.com', 'username': 'HanaPearlman'}Message: | ||||||||||||||||||
| Comment by Hana Pearlman [ 22/Mar/23 ] | ||||||||||||||||||
|
Yesterday, we merged an improvement to set min/max bounds on the clustered collscan based on the values in the $in list. For the example query in the ticket description, the new behavior looks like this:
Note the minRecord and maxRecord values on the ixscan. SERVER-75063 tracks a potential further improvement which is more complex but would give even tighter bounds. | ||||||||||||||||||
| Comment by Githook User [ 22/Mar/23 ] | ||||||||||||||||||
|
Author: {'name': 'Hana Pearlman', 'email': 'hana.pearlman@mongodb.com', 'username': 'HanaPearlman'}Message: | ||||||||||||||||||
| Comment by Kyle Suarez [ 07/Mar/23 ] | ||||||||||||||||||
|
We may consider doing this work as part of (or related to) kevin.cherkauer@mongodb.com's work to support clustered collections in SBE. CC amr.elhelw@mongodb.com | ||||||||||||||||||
| Comment by Kyle Suarez [ 07/Mar/23 ] | ||||||||||||||||||
|
Sending to the director triage queue for myself and bernard.gorman@mongodb.com to discuss for assignment. | ||||||||||||||||||
| Comment by Louis Williams [ 09/Jun/22 ] | ||||||||||||||||||
|
james.wahlin@mongodb.com provided a workaround: To avoid using a collection scan, provide $lte and $gte bounds to the query after manually identifying the largest and smallest _id values. For example:
|