[SERVER-44057] Find on non-indexed and sort on indexed is slower then find on non-indexed and sort on non-indexed Created: 17/Oct/19  Updated: 07/Nov/19  Resolved: 07/Nov/19

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 4.0.13
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Sean Morris Assignee: Eric Sedor
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-23406 index scan is slower than full collec... Backlog
Operating System: ALL
Steps To Reproduce:

Create a collection with 2 million entries containing 3 fields. Create an index on field C.

 

Do a find of A and sort on A.

Do a find of A and sort on C.

The second query is a magnitude slower then the first.

Participants:

 Description   

I have a collection with over 2 million entries in it. I allow find and sort on both indexed and non-indexed fields.
 
When I do a find and sort using only non-index fields it takes a few seconds and the query plan looks like this
 

/* 1 */
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "apc.radios",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "lifecycleState" : {
                "$eq" : "INVENTORY"
            }
        },
        "winningPlan" : {
            "stage" : "SORT",
            "sortPattern" : {
                "lifecycleState" : 1.0
            },
            "inputStage" : {
                "stage" : "SORT_KEY_GENERATOR",
                "inputStage" : {
                    "stage" : "COLLSCAN",
                    "filter" : {
                        "lifecycleState" : {
                            "$eq" : "INVENTORY"
                        }
                    },
                    "direction" : "forward"
                }
            }
        },
        "rejectedPlans" : []
    },
    "executionStats" : {
        "executionSuccess" : true,
        "nReturned" : 111,
        "executionTimeMillis" : 1962,
        "totalKeysExamined" : 0,
        "totalDocsExamined" : 2202676,
        "executionStages" : {
            "stage" : "SORT",
            "nReturned" : 111,
            "executionTimeMillisEstimate" : 1799,
            "works" : 2202791,
            "advanced" : 111,
            "needTime" : 2202679,
            "needYield" : 0,
            "saveState" : 17210,
            "restoreState" : 17210,
            "isEOF" : 1,
            "invalidates" : 0,
            "sortPattern" : {
                "lifecycleState" : 1.0
            },
            "memUsage" : 38922,
            "memLimit" : 33554432,
            "inputStage" : {
                "stage" : "SORT_KEY_GENERATOR",
                "nReturned" : 111,
                "executionTimeMillisEstimate" : 1709,
                "works" : 2202679,
                "advanced" : 111,
                "needTime" : 2202567,
                "needYield" : 0,
                "saveState" : 17210,
                "restoreState" : 17210,
                "isEOF" : 1,
                "invalidates" : 0,
                "inputStage" : {
                    "stage" : "COLLSCAN",
                    "filter" : {
                        "lifecycleState" : {
                            "$eq" : "INVENTORY"
                        }
                    },
                    "nReturned" : 111,
                    "executionTimeMillisEstimate" : 1599,
                    "works" : 2202678,
                    "advanced" : 111,
                    "needTime" : 2202566,
                    "needYield" : 0,
                    "saveState" : 17210,
                    "restoreState" : 17210,
                    "isEOF" : 1,
                    "invalidates" : 0,
                    "direction" : "forward",
                    "docsExamined" : 2202676
                }
            }
        },
        "allPlansExecution" : []
    },
    "serverInfo" : {
        "host" : "netmet-mongo-d86c5c5bb-vd89w",
        "port" : 27017,
        "version" : "4.0.9",
        "gitVersion" : "fc525e2d9b0e4bceff5c2201457e564362909765"
    },
    "ok" : 1.0
}
  


When I do a find on a non-indexed field but sort on an indexed field it is significantly slower which is not what I would expect. I would expect it to be the same or faster then the previous query since it should be using the index for the sort. Here is the query plan
  

/* 1 */
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "apc.radios",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "lifecycleState" : {
                "$eq" : "INVENTORY"
            }
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "filter" : {
                "lifecycleState" : {
                    "$eq" : "INVENTORY"
                }
            },
            "inputStage" : {
                "stage" : "IXSCAN",
                "keyPattern" : {
                    "repId" : 1
                },
                "indexName" : "repId_1",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "repId" : []
                },
                "isUnique" : true,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 2,
                "direction" : "forward",
                "indexBounds" : {
                    "repId" : [ 
                        "[MinKey, MaxKey]"
                    ]
                }
            }
        },
        "rejectedPlans" : []
    },
    "executionStats" : {
        "executionSuccess" : true,
        "nReturned" : 111,
        "executionTimeMillis" : 9362,
        "totalKeysExamined" : 2202669,
        "totalDocsExamined" : 2202669,
        "executionStages" : {
            "stage" : "FETCH",
            "filter" : {
                "lifecycleState" : {
                    "$eq" : "INVENTORY"
                }
            },
            "nReturned" : 111,
            "executionTimeMillisEstimate" : 9070,
            "works" : 2202670,
            "advanced" : 111,
            "needTime" : 2202558,
            "needYield" : 0,
            "saveState" : 17213,
            "restoreState" : 17213,
            "isEOF" : 1,
            "invalidates" : 0,
            "docsExamined" : 2202669,
            "alreadyHasObj" : 0,
            "inputStage" : {
                "stage" : "IXSCAN",
                "nReturned" : 2202669,
                "executionTimeMillisEstimate" : 1152,
                "works" : 2202670,
                "advanced" : 2202669,
                "needTime" : 0,
                "needYield" : 0,
                "saveState" : 17213,
                "restoreState" : 17213,
                "isEOF" : 1,
                "invalidates" : 0,
                "keyPattern" : {
                    "repId" : 1
                },
                "indexName" : "repId_1",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "repId" : []
                },
                "isUnique" : true,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 2,
                "direction" : "forward",
                "indexBounds" : {
                    "repId" : [ 
                        "[MinKey, MaxKey]"
                    ]
                },
                "keysExamined" : 2202669,
                "seeks" : 1,
                "dupsTested" : 0,
                "dupsDropped" : 0,
                "seenInvalidated" : 0
            }
        },
        "allPlansExecution" : []
    },
    "serverInfo" : {
        "host" : "netmet-mongo-d86c5c5bb-vd89w",
        "port" : 27017,
        "version" : "4.0.9",
        "gitVersion" : "fc525e2d9b0e4bceff5c2201457e564362909765"
    },
    "ok" : 1.0
}
 



 Comments   
Comment by Eric Sedor [ 23/Oct/19 ]

SERVER-23406 is considered an improvement rather than a bug. Importantly, directly comparing the performance between these queries is misleading because the differing sort makes them different queries, even if the result set is the same.

In the case of the above queries, the Query Planner is more at the mercy of certain basic concerns. Consider:

  • Both queries require visiting all documents
  • The second query also requires visiting all index nodes.
  • Were you to experiment with a limit of less than 111 to these queries, you may even start to see how the location of the documents in the index and collection start to become a factor in the speed at which results are returned.

Generally it is best to avoid these concerns by ensuring very good indexes for queries, such as {lifecycleState:1, repId:1} for the latter.

Does this make sense?

For further assistance reasoning about query performance, I encourage you to ask our community by posting on the mongodb-user group or on Stack Overflow with the mongodb tag.

Gratefully,
Eric

Comment by Sean Morris [ 22/Oct/19 ]

I see that SERVER-23406 has been open for awhile. Is this a confirmed bug and just not resolved yet, or is it considered to be working as expected? Thanks, Sean

Comment by David Storch [ 22/Oct/19 ]

eric.sedor sean.morris@xyleminc.com, at a cursory glance it appears that this may be the same issue as SERVER-23406. The fast plan is a collection scan followed by a blocking SORT. The slow plan has to scan an entire index. Although the fast plan needs to perform a blocking SORT operation and the slow one does not, the execution stats show that we only need to sort 111 matching documents, which should be pretty cheap. So the difference in performance appears to be coming primarily from the fact that scanning the whole collection is faster than scanning the entire {"repId": 1} index. SERVER-23406 has additional technical details about this issue.

Comment by Sean Morris [ 18/Oct/19 ]

Here is the first query:

db.getCollection('radios').find({"lifecycleState":"INVENTORY"}).sort({"lifecycleState":1})

Here is the second one:

db.getCollection('radios').find({"lifecycleState":"INVENTORY"}).sort({"repId":1})

 

There is no index on lifecycleState. The index on repId is unique and ascending.

 

Comment by Eric Sedor [ 17/Oct/19 ]

Hi sean.morris@xyleminc.com, thanks for this report. To help us interpret the explain output you're seeing and assist us in attempting an internal reproduction, could you please provide the exact operation you ran to produce these explain plans?

Gratefully,
Eric

Generated at Thu Feb 08 05:04:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.