[SERVER-13859] Sorting an indexed key in pipeline raises scanned and decreases performance Created: 07/May/14 Updated: 21/May/14 Resolved: 18/May/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework, Querying |
| Affects Version/s: | 2.2.3, 2.2.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | James Hartig | Assignee: | Unassigned |
| Resolution: | Won't Fix | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Steps To Reproduce: | db.test.ensureIndex( {i: 1, t: 1, _id: -1, cs: 1}); ); ); ); ); //indexed key sort , t: 1, cs: {"$gt":0}}}, {"$sort":{_id:-1}}], explain: true}); //non-indexed key sort , t: 1, cs: {"$gt":0}}}, {"$sort":{ts:-1}}], explain: true}); The reason for the index above is to optimize: |
||||
| Participants: | |||||
| Description |
|
If I specify a sort aggregation on an indexed key, performance is greatly reduced and the number of scanned items goes up. Looking at the explain, it seems to be adding another plan. When I use a non-indexed key it will not cause another plan and executes quickly. I know this affects 2.2.7 and 2.2.3 since I tested those both specifically. Didn't try anything between those. I'd like to see this backported into 2.2.8, if possible. |
| Comments |
| Comment by Asya Kamsky [ 18/May/14 ] | ||||||||||||||
|
It's unlikely that any work could be done on this in 2.2 branch as its end-of-life was Feb-14. Only major bugs or security issues are likely to be even considered for backport, and this is not a bug but rather the fact that aggregation had not had majority of its performance improvements until 2.4+ versions. | ||||||||||||||
| Comment by James Hartig [ 17/May/14 ] | ||||||||||||||
|
Attached are the 2 queries running on 2.2.x and the indexes on the collection. The nonIndex and index query return the same results but the nonIndex is significantly faster. If you need exact numbers I can get them but it's a matter of 600+ ms vs <100 ms. You'll also notice that it looks at less items (according to the explain at least). | ||||||||||||||
| Comment by Asya Kamsky [ 16/May/14 ] | ||||||||||||||
|
Can you please include your "explain" output (if it's huge, then as an attachment, otherwise just in comment with nofomat tags). I'm afraid I must not see the same thing you are reporting. | ||||||||||||||
| Comment by James Hartig [ 16/May/14 ] | ||||||||||||||
|
I read on the docs that reversing them might help but unfortunately it didn't. I just tried it on 2.4.10 and it is still worse with the indexed key. I didn't migrate our data to 2.6.x but I tried it with the tests in my comment above and the explain looks worse with the indexed key. There's more plans but I don't actually see where the explain shows the scanned docs like previous versions. | ||||||||||||||
| Comment by Asya Kamsky [ 16/May/14 ] | ||||||||||||||
|
fastest963 does this only happen on 2.2.x branch? Also, do you see the same thing happening if you reverse the order in the pipeline of $sort and $match stages? | ||||||||||||||
| Comment by James Hartig [ 07/May/14 ] | ||||||||||||||
|
The steps to reproduce ended up being horribly formatted. Here they are again:
indexed key sort
non-indexed key sort
The reason for the index above is to optimize:
|