[SERVER-55422] Should use wildcard index for descending sort with limit Created: 22/Mar/21  Updated: 06/Dec/22  Resolved: 25/Mar/21

Status: Closed
Project: Core Server
Component/s: Query Planning
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Mathias Stearn Assignee: Backlog - Query Optimization
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:

 Description   

Currently we don't use wildcard indexes for any sorts unless the query would restrict to documents that have the field we are sorting on. This is because they are sparse indexes and don't have entries for documents that are missing the field. However this is too restrictive for descending sorts with limits (eg find().sort({score:-1}).limit(1)) because null and missing values would be at the end rather than the beginning of the sorted output. This means that we can scan the index to return the top K results. If we hit the end of a path-prefixed [MaxKey, null) scan before collecting K results, then we will know that we need to fall back to a table scan to sort all documents with a nullish value for the sort field. But that is unlikely to happen in practice with a top K sort*, and if it does, we will have only wasted a relatively small amount of work in the index.

* Notable exceptions include on an empty collection where everything is ~free and when the user typos a field name in the sort, in which case the index scan will be ~free.



 Comments   
Comment by Mathias Stearn [ 30/Mar/21 ]

Sorry for not replying sooner, but I think I agree with this outcome. I was somewhat mislead by a misreading of the output when adding a {x: {$ne: null}} predicate, since it looks like it does use the index, but not for providing a sort:

> db.blah.createIndex({'$**': 1})
{
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "createdCollectionAutomatically" : true,
        "ok" : 1
}
> db.blah.find({x:{$ne:null}}).sort({x: -1}).explain()
{
        "explainVersion" : "1",
        "queryPlanner" : {
                "namespace" : "test.blah",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "x" : {
                                "$not" : {
                                        "$eq" : null
                                }
                        }
                },
                "queryHash" : "01F26CBA",
                "planCacheKey" : "560833DE",
                "maxIndexedOrSolutionsReached" : false,
                "maxIndexedAndSolutionsReached" : false,
                "maxScansToExplodeReached" : false,
                "winningPlan" : {
                        "stage" : "SORT",
                        "sortPattern" : {
                                "x" : -1
                        },
                        "memLimit" : 104857600,
                        "type" : "simple",
                        "inputStage" : {
                                "stage" : "FETCH",
                                "filter" : {
                                        "x" : {
                                                "$not" : {
                                                        "$eq" : null
                                                }
                                        }
                                },
                                "inputStage" : {
                                        "stage" : "IXSCAN",
                                        "keyPattern" : {
                                                "$_path" : 1,
                                                "x" : 1
                                        },
                                        "indexName" : "$**_1",
                                        "isMultiKey" : false,
                                        "multiKeyPaths" : {
                                                "$_path" : [ ],
                                                "x" : [ ]
                                        },
                                        "isUnique" : false,
                                        "isSparse" : false,
                                        "isPartial" : false,
                                        "indexVersion" : 2,
                                        "direction" : "forward",
                                        "indexBounds" : {
                                                "$_path" : [
                                                        "[\"x\", \"x\"]",
                                                        "[\"x.\", \"x/\")"
                                                ],
                                                "x" : [
                                                        "[MinKey, MaxKey]"
                                                ]
                                        }
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },
        "command" : {
                "find" : "blah",
                "filter" : {
                        "x" : {
                                "$ne" : null
                        }
                },
                "sort" : {
                        "x" : -1
                },
                "$db" : "test"
        },
        "serverInfo" : {
                "host" : "ip-10-122-10-16",
                "port" : 27017,
                "version" : "4.9.0-alpha4-13-gbed3256",
                "gitVersion" : "bed32560b4ef8df1eb6635c6d756119ab0e685a4"
        },
        "ok" : 1
}

And it looks like we already do the right thing if you use {x: {$type: 'number'}} to clamp it to numeric values:

> db.blah.find({x:{$type:'number'}}).sort({x: -1}).explain()
{
        "explainVersion" : "1",
        "queryPlanner" : {
                "namespace" : "test.blah",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "x" : {
                                "$type" : [
                                        "number"
                                ]
                        }
                },
                "queryHash" : "3592C2C7",
                "planCacheKey" : "5167D264",
                "maxIndexedOrSolutionsReached" : false,
                "maxIndexedAndSolutionsReached" : false,
                "maxScansToExplodeReached" : false,
                "winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "$_path" : 1,
                                        "x" : 1
                                },
                                "indexName" : "$**_1",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                        "$_path" : [ ],
                                        "x" : [ ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "direction" : "backward",
                                "indexBounds" : {
                                        "$_path" : [
                                                "[\"x\", \"x\"]"
                                        ],
                                        "x" : [
                                                "[inf.0, nan.0]"
                                        ]
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },
        "command" : {
                "find" : "blah",
                "filter" : {
                        "x" : {
                                "$type" : "number"
                        }
                },
                "sort" : {
                        "x" : -1
                },
                "$db" : "test"
        },
        "serverInfo" : {
                "host" : "ip-10-122-10-16",
                "port" : 27017,
                "version" : "4.9.0-alpha4-13-gbed3256",
                "gitVersion" : "bed32560b4ef8df1eb6635c6d756119ab0e685a4"
        },
        "ok" : 1
}

So it currently is possible to get a sort out of a wildcard index, but it requires adding a possibly unnecessary predicate to help out the optimizer, since it doesn't know whether there are no non-numeric values.

Comment by Ian Boros [ 22/Mar/21 ]

Something else worth keeping in mind the fact that wildcard indexes do not store objects as "blobs." That is, if there's a document {a: {b: 1, c: 1}} a wildcard index will have keys for "a.b" and "a.c" but not for "a".

The bounds would have to account for this to ensure that we switch to a collection scan and not "miss" any documents where the field we're sorting by is an object. Unfortunately, with the way BSON types are ordered, I believe we'd be falling back to coll scan in any case where a document in the result set has a string or number in this field.

BSON type ordering:

        ... 
        case NumberDecimal:
        case NumberDouble:
        case NumberInt:
        case NumberLong:
            return 10;
        case mongo::String:
        case Symbol:
            return 15;
        case Object:
            return 20;
        case mongo::Array:
            return 25;
        case BinData:
            return 30;
        case jstOID:
            return 35;
...

Generated at Thu Feb 08 05:36:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.