[SERVER-18115] The planner can add an unnecessary in-memory sort stage for .min()/.max() queries Created: 19/Apr/15  Updated: 17/Nov/16  Resolved: 02/Feb/16

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.9, 3.0.2
Fix Version/s: 3.2.3, 3.3.2

Type: Bug Priority: Major - P3
Reporter: Pierre Coquentin Assignee: Tess Avitabile (Inactive)
Resolution: Done Votes: 0
Labels: code-and-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 12.04


Issue Links:
Related
is related to SERVER-15086 Allow for efficient range queries ove... Closed
Backwards Compatibility: Fully Compatible
Operating System: Linux
Backport Completed:
Steps To Reproduce:

So, first step, create a collection with a compound multikey index.

> var now = new Date().getTime(); 
> for (var i = 0; i < 10000; i++) { 
       db.SampleLong.insert({_id: ObjectId(), a: [{_id: 1}, {_id: i + 2}], t: NumberLong(now + i)}) 
  }
> db.SampleLong.ensureIndex({"a._id": 1, t: -1}, {background: true})

In 2.4.13, we have this when executing an ordered query using min / max

> db.SampleLong.find({"a._id": 1, t: {"$lt": NumberLong("1429191483598"), "$gte": NumberLong("1429191473599")}}).min({"a._id": 1, t: NumberLong("1429191483598")}).max({"a._id": 1, t: NumberLong("1429191473599")}).limit(10).sort({t: -1}).explain()
{
    "cursor" : "BtreeCursor a._id_1_t_-1",
    "isMultiKey" : true,
    "n" : 10,
    "nscannedObjects" : 11,
    "nscanned" : 11,
    "nscannedObjectsAllPlans" : 11,
    "nscannedAllPlans" : 11,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "start" : {
            "a._id" : 1,
            "t" : NumberLong("1429191483598")
        },
        "end" : {
            "a._id" : 1,
            "t" : NumberLong("1429191473599")
        }
    },
    "server" : "frank:27017"
}

And, now, the same query but on 2.6.9 (same result with MongoDB 3.0.2). MongoDB set the scanAndOrder to true whereas the index is already sorted.

> db.SampleLong.find({"a._id": 1, t: {"$lt": NumberLong("1429191483598"), "$gte": NumberLong("1429191473599")}}).min({"a._id": 1, t: NumberLong("1429191483598")}).max({"a._id": 1, t: NumberLong("1429191473599")}).limit(10).sort({t: -1}).explain()
{
    "clauses" : [
        {
            "cursor" : "BtreeCursor a._id_1_t_-1",
            "isMultiKey" : true,
            "n" : 10,
            "nscannedObjects" : 9999,
            "nscanned" : 10000,
            "scanAndOrder" : true,
            "indexOnly" : false,
            "nChunkSkips" : 0,
            "indexBounds" : {
 
            }
        },
        {
            "cursor" : "BtreeCursor a._id_1_t_-1",
            "isMultiKey" : true,
            "n" : 0,
            "nscannedObjects" : 0,
            "nscanned" : 0,
            "scanAndOrder" : true,
            "indexOnly" : false,
            "nChunkSkips" : 0,
            "indexBounds" : {
 
            }
        }
    ],
    "cursor" : "QueryOptimizerCursor",
    "n" : 10,
    "nscannedObjects" : 9999,
    "nscanned" : 10000,
    "nscannedObjectsAllPlans" : 9999,
    "nscannedAllPlans" : 10000,
    "scanAndOrder" : false,
    "nYields" : 78,
    "nChunkSkips" : 0,
    "millis" : 49,
    "server" : "frank:27017",
    "filterSet" : false
}

Sprint: Query F (02/01/16), Query 10 (02/22/16)
Participants:

 Description   

IndexScanNode::computeProperties() computes sort orders provided by the index scan. It obtains some of these sort orders by looking at the contents of its IndexBounds instance. However, it fails to compute some obvious sort orders when isSimpleRange is true. As a result, plans for .min()/.max() queries may involve an unnecessary in-memory SORT stage.

Original Description

We currently run a sharding in production in 2.4.13 with a fairly large collection and a compound multikey index. We met the problem with multi key index with unbound limit that we solved using cursor.min and cursor.max.
But after upgrading to 2.6.9, we realized that the behavior between the two versions has changed. MongoDB uses the correct index but specifies a scanAndOrder to true even if the index is well sorted.
I have asked the question on stackoverflow but didn't receive so far any answers.
http://stackoverflow.com/questions/29679635/behavior-has-changed-between-mongodb-2-4-and-2-6-for-cursor-min-max



 Comments   
Comment by Githook User [ 02/Feb/16 ]

Author:

{u'username': u'tessavitabile', u'name': u'Tess Avitabile', u'email': u'tess.avitabile@mongodb.com'}

Message: SERVER-18115 Avoid unnecessary in-memory sort stage for .min()/.max() queries

(cherry picked from commit ecfac2be1f855c19ef6b50eef712fd3b6b9631f7)
Branch: v3.2
https://github.com/mongodb/mongo/commit/44f0661a7a770bad193582935d001b4754389b34

Comment by Githook User [ 02/Feb/16 ]

Author:

{u'username': u'tessavitabile', u'name': u'Tess Avitabile', u'email': u'tess.avitabile@mongodb.com'}

Message: SERVER-18115 Avoid unnecessary in-memory sort stage for .min()/.max() queries
Branch: master
https://github.com/mongodb/mongo/commit/ecfac2be1f855c19ef6b50eef712fd3b6b9631f7

Comment by Ramon Fernandez Marina [ 25/Nov/15 ]

PierreCoquentin, this ticket is currently not scheduled, so we can't provide an estimate of when it will be fixed. Any scheduling changes will be posted here when they happen.

Comment by Pierre Coquentin [ 17/Nov/15 ]

Hello,

Is it possible to know if this bug will be corrected in the next version of mongodb ? Because, we are blocked to upgrade to mongodb 3.0 and thus benefit of all improvements done.
Best regards

Comment by Pierre Coquentin [ 20/Apr/15 ]

The explanation is quite clear.
Thanks

Comment by David Storch [ 20/Apr/15 ]

Hi pcoq,

Thanks for reporting this issue. I can confirm that the rewrite to the query engine introduced this problem. Note that this is purely a performance problem, i.e. the query engine is failing to recognize that an in-memory sort of the result set is unnecessary. However, given that this optimization was present in previous versions and is now missing, it is fair to consider this a bug.

The problem stems from the fact that index bounds creating using the .min()/.max() operators are stored using a different format than those created from regular query predicates (see index_bounds.h for details). We have the special format for .min()/.max() because these operators allow bounds to be expressed that are not expressable using the standard format. I have a comment explaining why on SERVER-17076.

The sort analysis code, which is responsible for determining whether or not an in-memory SORT stage is needed, operates only over the standard index bounds format. It does not consider the special "isSimpleRange" bounds format when doing sort analysis, which is the root cause of the problem.

Hope that helps to clear things up, but let me know if you have any further questions. I'm going to move this ticket into "Needs Triage" state. Please continue to watch this ticket for progress updates.

Best,
Dave

Generated at Thu Feb 08 03:46:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.