[SERVER-67105] $in queries do not use clustered index Created: 08/Jun/22  Updated: 29/Oct/23  Resolved: 22/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.0.0, 6.0.0
Fix Version/s: 7.0.0-rc0, 6.0.6, 6.3.2

Type: Improvement Priority: Major - P3
Reporter: Louis Williams Assignee: Hana Pearlman
Resolution: Fixed Votes: 2
Labels: query-product-scope-1, query-product-urgency-2, query-product-value-1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
causes SERVER-76102 handleRIDRangeScan() sets hasCompatib... Closed
Related
is related to SERVER-75063 Support more complex scan bounds for ... Backlog
Assigned Teams:
Query Optimization
Backwards Compatibility: Fully Compatible
Backport Requested:
v6.3, v6.0, v5.0
Sprint: QO 2023-03-20, QO 2023-04-03
Participants:
Case:

 Description   

A query that uses $in on _id does not plan against a clustered index if one exists. If no suitable secondary index exists, we fall back to a collection scan.

This affects queries directly on time-series buckets collections.

test> db.createCollection('clustered', {clusteredIndex: {key: {_id: 1}, unique: true}})
test> db.clustered.find({_id: {$in: [0,1,2,3]}}).explain()
{
  explainVersion: '1',
  queryPlanner: {
    namespace: 'test.clustered',
    indexFilterSet: false,
    parsedQuery: { _id: { '$in': [ 0, 1, 2, 3 ] } },
    queryHash: '0D7870DC',
    planCacheKey: '0D7870DC',
    maxIndexedOrSolutionsReached: false,
    maxIndexedAndSolutionsReached: false,
    maxScansToExplodeReached: false,
    winningPlan: {
      stage: 'COLLSCAN',
      filter: { _id: { '$in': [ 0, 1, 2, 3 ] } },
      direction: 'forward'
    },
    rejectedPlans: []
  },



 Comments   
Comment by Githook User [ 18/Apr/23 ]

Author:

{'name': 'Hana Pearlman', 'email': 'hana.pearlman@mongodb.com', 'username': 'HanaPearlman'}

Message: SERVER-67105: Allow in queries to use clustered index
Branch: v6.3
https://github.com/mongodb/mongo/commit/f37a53c33d7e032d1708c79e1dd507ec46ed3c52

Comment by Githook User [ 04/Apr/23 ]

Author:

{'name': 'Hana Pearlman', 'email': 'hana.pearlman@mongodb.com', 'username': 'HanaPearlman'}

Message: SERVER-67105: Allow in queries to use clustered index
Branch: v6.0
https://github.com/mongodb/mongo/commit/27f44ffcfb8c23e03a5fcde4ffeb20b63a19ac75

Comment by Hana Pearlman [ 22/Mar/23 ]

Yesterday, we merged an improvement to set min/max bounds on the clustered collscan based on the values in the $in list. For the example query in the ticket description, the new behavior looks like this:

test> db.createCollection(‘clustered’, {clusteredIndex: {key: {_id: 1}, unique: true}})
test> db.clustered.find({_id: {$in: [0,1,2,3]}}).explain()
{
        "explainVersion" : "1",
        "queryPlanner" : {
                "namespace" : "test.clustered",
                "indexFilterSet" : false,
                "parsedQuery" : { "_id" : { "$in" : [0,1,2,3] }},
            ...
                "winningPlan" : {
                        "stage" : "CLUSTERED_IXSCAN",
                        "filter" : { "_id" : { "$in" : [0, 1, 2, 3 ] }},
                        "direction" : "forward",
                        "minRecord" : 0,
                        "maxRecord" : 3
                },
                "rejectedPlans" : [ ]
        },

Note the minRecord and maxRecord values on the ixscan.

SERVER-75063 tracks a potential further improvement which is more complex but would give even tighter bounds.

Comment by Githook User [ 22/Mar/23 ]

Author:

{'name': 'Hana Pearlman', 'email': 'hana.pearlman@mongodb.com', 'username': 'HanaPearlman'}

Message: SERVER-67105: Allow in queries to use clustered index
Branch: master
https://github.com/mongodb/mongo/commit/9f391ea93fa62f57d6b4576130283d98eec69456

Comment by Kyle Suarez [ 07/Mar/23 ]

We may consider doing this work as part of (or related to) kevin.cherkauer@mongodb.com's work to support clustered collections in SBE. CC amr.elhelw@mongodb.com

Comment by Kyle Suarez [ 07/Mar/23 ]

Sending to the director triage queue for myself and bernard.gorman@mongodb.com to discuss for assignment.

Comment by Louis Williams [ 09/Jun/22 ]

james.wahlin@mongodb.com provided a workaround: To avoid using a collection scan, provide $lte and $gte bounds to the query after manually identifying the largest and smallest _id values.

For example:

db.clustered.find({$and: [{_id: {$gte: 1}}, {_id: {$lte: 3}}, {_id: {$in: [1, 2, 3]}}]});

Generated at Thu Feb 08 06:07:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.