[SERVER-13197] Tighten index bounds and allow compound index to be chosen when predicate on leading field is not provided Created: 14/Mar/14  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.0
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Neil Lunn Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 8
Labels: query, query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File server13197.js    
Issue Links:
Depends
Duplicate
is duplicated by SERVER-28957 Index bounds are not detected correctly Closed
is duplicated by SERVER-31064 query isn't index only unless project... Closed
is duplicated by SERVER-31100 Since 2.6, full collection scan inste... Closed
is duplicated by SERVER-2140 query optimizer should be able to pic... Closed
is duplicated by SERVER-15636 Query Optimizer Index Scans when firs... Closed
Related
related to SERVER-15239 serverStatus metrics incorrect: "oper... Closed
Assigned Teams:
Query Optimization
Participants:
Case:

 Description   

Updated description:

For a compound index {a: 1, b: 1}, a range query {b: {$gt: 10, $lt: 20}} (using a hint on the compound index) produces the following index bounds:

// 2.4:
{a: [{$minElement: 1}, {$maxElement: 1}], b: [10, 20]}
 
// 2.6
{a: [{$minElement: 1}, {$maxElement: 1}], b: [{$minElement: 1}, {$maxElement: 1}]}

Bounds could be tighter under 2.6.

As a workaround for 2.6, consider adding the leading field to the query.

// before:
{b: {$gt: 10, $lt: 20}}
 
// workaround:
{a: {{$gt: MinKey, $lt: MaxKey}, b: {$gt: 10, $lt: 20}}

--------
This is the output from the reproduction steps in current 2.4 releases:

{
        "cursor" : "BtreeCursor name_1_age_1",
        "isMultiKey" : false,
        "n" : 60803,
        "nscannedObjects" : 60803,
        "nscanned" : 60827,
        "nscannedObjectsAllPlans" : 60803,
        "nscannedAllPlans" : 60827,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 207,
        "indexBounds" : {
                "name" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ],
                "age" : [
                        [
                                21,
                                30
                        ]
                ]
        },
        "server" : "ubuntu:27017"
}

The range shown in the index bound comes from the range on the query, and is the second element in the compound key.

The same data in 2.6.0-rc0 and rc1 produces this:

{
        "cursor" : "BtreeCursor name_1_age_1",
        "isMultiKey" : false,
        "n" : 60803,
        "nscannedObjects" : 200000,
        "nscanned" : 200000,
        "nscannedObjectsAllPlans" : 200000,
        "nscannedAllPlans" : 200000,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 1562,
        "nChunkSkips" : 0,
        "millis" : 646,
        "indexBounds" : {
                "name" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ],
                "age" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ]
        },
        "server" : "raring:27017",
        "filterSet" : false
}

So the range is now gone, nscanned is up to the full size of the collection as also has the execution time increased as a result of the scan.

As stated, the matching range is on the second element of the compound key, but if this is intentional why does the prior version match the same number of documents and in considerably less time?



 Comments   
Comment by Neil Lunn [ 18/Apr/14 ]

Glad that this is cleared up. I actually did become aware that specifying the MinKey and MaxKey values in the query would correct the behavior in the optimizer a few days ago, and then only after an exhaustive search of the new Index interface code.

The usage of the hint in the example was actually just a simulation of the intended sort operation as referenced in this question that was asked on stackoverflow http://stackoverflow.com/questions/22276988/mongodb-how-does-it-avoid-full-collection-scan

So the point was to explain that the optimizer should be using the bounds when specified as such, yet this led to a false call of a bug in the 2.4 release, so hence the submission here.

So I am glad that this confirms that the bounds should be used, and that use of MinKey and MaxKey should be an acceptable workaround until the previous behavior is restored.

Regards,
Neil

P.S Much better title for the issue as it is very descriptive of the actual problem

Comment by Benety Goh [ 17/Apr/14 ]

HI neillunn,

Thanks for bringing this to our attention. This is indeed a regression in our index bounds logic. We have been able to reproduce your results and would like suggest a workaround for your application.

Instead of specifying a predicate on age and using an explicit hint on the compound index, we'd like to suggest amending the query to include a MinKey, MaxKey range on name:

db.names.find({
    name: {$gt: MinKey, $lt: MinKey},
    age: {$gte: 21, $lte: 30} 
}).explain()

With the additional predicate on name in the query, the hint is no longer necessary.

Regards,
Ben

Comment by Neil Lunn [ 03/Apr/14 ]

It would be nice to have some kind of progress update on this. Given the extensive re-factoring of the index API this very much "smells" of a regression.

A quick perusal of the code seems to indicate that various methods that were present, prior to the re-factor seem have been omitted. But do to the re-factor this is very hard to lock down without adequate unit tests.

What seems to be missing is the correct detection of the "IndexBounds" that seemed to exist before.

Any comment?

Comment by Asya Kamsky [ 14/Mar/14 ]

Note: this requires an explicit hint with the compound index to reproduce.

Generated at Thu Feb 08 03:30:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.