[SERVER-13065] Consider a collection scan even if indexed plans are available Created: 05/Mar/14  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 2.4.9, 2.6.0-rc0
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Michael Kennedy Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: bonsai
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive bookstore.zip    
Issue Links:
Duplicate
is duplicated by SERVER-29227 $nin operation choose IXSCAN but not ... Closed
is duplicated by SERVER-12871 Seemingly unreasonable overhead to ra... Closed
is duplicated by SERVER-33556 range scan query optimizing Closed
is duplicated by SERVER-52868 Optimizing update query Closed
is duplicated by SERVER-53709 MongoDB Time-Spatial Query is slower ... Closed
Related
Assigned Teams:
Query Optimization
Participants:

 Description   

Restore the database from the attached BSON data (I called it BookStore FWIW).

Then this query without indexes takes about 300 ms and uses basic cusor as you would expect:

> db.Book.find({ "Ratings" : { "$elemMatch" : { "Value" : { "$gte" : 0 } } } } )

Running explain we see it is using a cursor and it takes 891 ms.

{
        "cursor" : "BtreeCursor Ratings.Value_1",
        "isMultiKey" : true,
        "n" : 270171,
        "nscannedObjects" : 270171,
        "nscanned" : 443781,
        "nscannedObjectsAllPlans" : 270270,
        "nscannedAllPlans" : 443880,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 3467,
        "nChunkSkips" : 0,
        "millis" : 891,
        "indexBounds" : {
                "Ratings.Value" : [
                        [
                                0,
                                Infinity
                        ]
                ]
        },
        "server" : "WIN-7S2IMPQ2TOE:27017",
        "filterSet" : false
}
>

Now, remove that index:

> db.Book.dropIndex("Ratings.Value_1")
{ "nIndexesWas" : 7, "ok" : 1 }

And you will see it runs in 300ms!

{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 270171,
        "nscannedObjects" : 271380,
        "nscanned" : 271380,
        "nscannedObjectsAllPlans" : 271380,
        "nscannedAllPlans" : 271380,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 2120,
        "nChunkSkips" : 0,
        "millis" : 317,
        "server" : "WIN-7S2IMPQ2TOE:27017",
        "filterSet" : false
}

It seems like mongo should either:

  • Not use the index
  • Acheive at least as good of perf

To make matters much worse, I was actually running just a count operation via the C# driver and go even worse perf. Seems like there should be almost no perf cost for a count operation on an index set.



 Comments   
Comment by Asya Kamsky [ 27/Mar/18 ]

Ah yes, got it.

Comment by Matthew Kruse [ 27/Mar/18 ]

Asya, this issue was closed a a duplicate of this one. It has my summary of what I experienced on 3.6.2

https://jira.mongodb.org/browse/SERVER-33556

Comment by Asya Kamsky [ 27/Mar/18 ]

mkruse@adobe.com can you please clarify what exactly you are observing?

Comment by Matthew Kruse [ 26/Mar/18 ]

I'm still seeing this behavior on mongo 3.6.2. Be nice not to have to control for this with query hints and have the optimizer do this for everyone.

Comment by Michael Kennedy [ 10/Mar/14 ]

Thanks. I know it's kind of extreme, but I appreciate you following up on it.!

Comment by David Storch [ 10/Mar/14 ]

Hi Michael,

Thanks for another detailed bug report and for attaching the data so that we can reproduce the issue. The discrepancy in performance between the collection scan and the indexed plan occurs because your example query returns nearly all of the documents in the collection. There are 271380 documents in the "Books" collection, and the query returns 270171 (or 99.55%). It is not surprising that the indexed solution takes longer in this case: both plans scan through about 270 thousand documents, but the indexed plan incurs the additional cost of scanning through about 270 thousand index keys.

In both versions 2.4.9 and 2.6.0-rc0, MongoDB will select an indexed plan if one is available. This is why you see the slower indexed plan being chosen over the collection scan solution. Careful design of schema and indices can prevent such a scenario. Nevertheless, you could imagine a query optimizer which would be able to intelligently choose among indexed plans and a collection scan in order to avoid this automatically. I'm going to retitle this ticket to that effect and get it triaged.

Comment by Michael Kennedy [ 05/Mar/14 ]

Sorry I forgot to add this when I submitted:

I tested this both on 2.4.9 and 2.6.0_rc0.

Generated at Thu Feb 08 03:30:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.