[SERVER-13859] Sorting an indexed key in pipeline raises scanned and decreases performance Created: 07/May/14  Updated: 21/May/14  Resolved: 18/May/14

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Querying
Affects Version/s: 2.2.3, 2.2.7
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: James Hartig Assignee: Unassigned
Resolution: Won't Fix Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File indexes.txt     Text File sortOnIndex.txt     Text File sortOnNonIndex.txt    
Issue Links:
Duplicate
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

db.test.ensureIndex(

{i: 1, t: 1, _id: -1, cs: 1}

);
db.test.insert(

{i: 1, t:1, ts: 1, cs: 1, c: []}

);
db.test.insert(

{i: 2, t:1, ts: 2, cs: 1, c: []}

);
db.test.insert(

{i: 3, t:1, ts: 3, cs: 1, c: []}

);
db.test.insert(

{i: 4, t:1, ts: 4, cs: 1, c: []}

);

//indexed key sort
db.test.runCommand('aggregate', {pipeline: [{"$match": {i:

{"$in":[1,2,3,4]}

, t: 1, cs:

{"$gt":0}

}}, {"$sort":{_id:-1}}], explain: true});

//non-indexed key sort
db.test.runCommand('aggregate', {pipeline: [{"$match": {i:

{"$in":[1,2,3,4]}

, t: 1, cs:

{"$gt":0}

}}, {"$sort":{ts:-1}}], explain: true});

The reason for the index above is to optimize:
db.test.find({i: 1, t: 1, cs: {"$gt":0}}).sort({_id: -1}).limit(1)

Participants:

 Description   

If I specify a sort aggregation on an indexed key, performance is greatly reduced and the number of scanned items goes up. Looking at the explain, it seems to be adding another plan. When I use a non-indexed key it will not cause another plan and executes quickly.

I know this affects 2.2.7 and 2.2.3 since I tested those both specifically. Didn't try anything between those. I'd like to see this backported into 2.2.8, if possible.



 Comments   
Comment by Asya Kamsky [ 18/May/14 ]

It's unlikely that any work could be done on this in 2.2 branch as its end-of-life was Feb-14.

Only major bugs or security issues are likely to be even considered for backport, and this is not a bug but rather the fact that aggregation had not had majority of its performance improvements until 2.4+ versions.

Comment by James Hartig [ 17/May/14 ]

Attached are the 2 queries running on 2.2.x and the indexes on the collection. The nonIndex and index query return the same results but the nonIndex is significantly faster. If you need exact numbers I can get them but it's a matter of 600+ ms vs <100 ms. You'll also notice that it looks at less items (according to the explain at least).

Comment by Asya Kamsky [ 16/May/14 ]

Can you please include your "explain" output (if it's huge, then as an attachment, otherwise just in comment with nofomat tags).

I'm afraid I must not see the same thing you are reporting.

Comment by James Hartig [ 16/May/14 ]

I read on the docs that reversing them might help but unfortunately it didn't.

I just tried it on 2.4.10 and it is still worse with the indexed key. I didn't migrate our data to 2.6.x but I tried it with the tests in my comment above and the explain looks worse with the indexed key. There's more plans but I don't actually see where the explain shows the scanned docs like previous versions.

Comment by Asya Kamsky [ 16/May/14 ]

fastest963 does this only happen on 2.2.x branch?

Also, do you see the same thing happening if you reverse the order in the pipeline of $sort and $match stages?

Comment by James Hartig [ 07/May/14 ]

The steps to reproduce ended up being horribly formatted. Here they are again:

db.test.ensureIndex({i: 1, t: 1, _id: -1, cs: 1});
db.test.insert({i: 1, t:1, ts: 1, cs: 1, c: []});
db.test.insert({i: 2, t:1, ts: 2, cs: 1, c: []});
db.test.insert({i: 3, t:1, ts: 3, cs: 1, c: []});
db.test.insert({i: 4, t:1, ts: 4, cs: 1, c: []});

indexed key sort

db.test.runCommand('aggregate', {pipeline: [
    {"$match": {i: {"$in": [1,2,3,4]}, t: 1, cs: {"$gt": 0}}},
    {"$sort": {_id: -1}}
], explain: true});

non-indexed key sort

db.test.runCommand('aggregate', {pipeline: [
    {"$match": {i: {"$in": [1,2,3,4]}, t: 1, cs: {"$gt": 0}}},
    {"$sort": {ts: -1}}
], explain: true});

The reason for the index above is to optimize:

db.test.find({i: 1, t: 1, cs: {"$gt": 0}}).sort({_id: -1}).limit(1)

Generated at Thu Feb 08 03:33:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.