[SERVER-11447] aggregation can sort using index to speed up group of an indexed field Created: 29/Oct/13  Updated: 12/Mar/17  Resolved: 08/Apr/15

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 2.4.7, 2.5.3
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Unassigned
Resolution: Duplicate Votes: 6
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-4507 aggregation: optimize $group to take... Backlog
is duplicated by SERVER-14303 Allow aggregation $group operator to ... Closed
is duplicated by SERVER-15291 slow '$group' performance Closed
Related
Backwards Compatibility: Fully Compatible
Participants:

 Description   

If "$group" is grouping on an indexed field F and if all the functions are not dependent on the rest of the document (such as $sum:1 aka count) huge improvement can be made in performance by adding '{$sort:{F:1'}} before the '{$group}'

Tested on large collection (TPCH orders denormalized with lineitems inside) about 1.5 million documents aggregating by order date (2600 different dates) all after warming the data first:

Without sort: 18-19 seconds
With sort: 2.5-2.6 seconds

On really small datasets I still see at least 25%-33% improvement with $sort so if we can do that "automatically" that would help performance.



 Comments   
Comment by Dan Doyle [ 15/Jan/15 ]

This is a very important issue for our use case as well. We currently do an aggregate to determine a total number of possible unique results and this issue accounts for most of the runtime of any data fetch.

Comment by Sylvain Zimmer [ 03/Jan/15 ]

Any news on this issue? Thanks!

Generated at Thu Feb 08 03:25:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.