[SERVER-4507] aggregation: optimize $group to take advantage of sorted sequences Created: 15/Dec/11  Updated: 09/Jan/24

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Daniel Pasette (Inactive) Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 54
Labels: optimization
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-4566 In new aggregation framework, $sort, ... Closed
is depended on by SERVER-5795 Very Poor Performances Closed
is depended on by SERVER-84466 Complete TODO listed in SERVER-23318 Backlog
Duplicate
is duplicated by SERVER-41891 $group on sorted cursor forces full c... Closed
is duplicated by SERVER-11447 aggregation can sort using index to s... Closed
is duplicated by SERVER-37242 $group on sort key (after $sort) coul... Closed
is duplicated by SERVER-21502 $group using index Closed
Related
related to SERVER-4961 $group is taking 2x as long as collec... Closed
related to SERVER-23477 Aggregation's streaming $group should... Closed
related to SERVER-5361 early $group should provide a hint to... Closed
related to SERVER-20066 Query planner should consider index s... Closed
related to SERVER-40056 Remove disabled partial implementatio... Closed
related to SERVER-55576 Optimize queries on time-series colle... Closed
is related to SERVER-2130 Ability to use Limit() with Distinct() Backlog
is related to SERVER-447 new aggregation framework Closed
is related to SERVER-29444 Use a covered and streaming plan for ... Backlog
is related to SERVER-9507 Optimize $sort+$group+$first pipeline... Closed
is related to SERVER-21022 Coalesce $group and $limit in aggrega... Closed
Assigned Teams:
Query Execution
Backwards Compatibility: Fully Compatible
Sprint: Query 10 (02/22/16), Query 11 (03/14/16), Query 12 (04/04/16)
Participants:
Case:

 Description   

If the input to a $group is sorted by the _id, then only one bucket needs to be maintained at a time. This will reduce memory requirements significantly.

We can examine the _id and take advantage of this if we can force an index scan on the key, if there are no intervening pipeline operations that would affect the results.



 Comments   
Comment by Gavin Price [ 22/Sep/21 ]

Hi Asya,

> Please note however that in your example $limit:10 applies to the number of grouped documents kept (which is inconsistent with your comment about keeping first ten documents for each group).

I actually do want the limit to apply to the entire aggregation, not individual groups. I've edited my comment to make that more clear. The problem I'm having with the query in my comment is that there are millions of groups in my DB, and so the query runs out of memory storing all those groups. If the limit was applied while the grouping process was ongoing, the process would only ever have to store <=10 groups, and could then return.

 

Does this change the tickets you think are pertinent?

Comment by Asya Kamsky [ 22/Sep/21 ]

gaprice@lbl.gov

I believe part of the use case you describe is actually covered by SERVER-9507 and SERVER-40090 (and related linked tickets).  For your case of multiple fields to be grouped on, SERVER-53626 would be the ticket to watch/vote up.

At least the more efficient use of the index is covered to make $group more efficient is covered - post-$group limit cannot be absorbed into $group unless $group is known to be streaming.

Please note however that in your example $limit:10 applies to the number of grouped documents kept (which is inconsistent with your comment about keeping first ten documents for each group).

Comment by Gavin Price [ 21/Sep/21 ]

I believe this is a use case that would be enabled by this change, assuming $group and $limit can coalesce:

db.mycol.aggregate([
   {$sort: {container: 1, id: 1, ver: -1}}, 
   {$group: {_id:{container: "$container", id: "$id"}, data: {$first: "$$ROOT"}}}, 
   {$limit: 10}
])

In this case, there is an index that covers the sort, so the aggregation effectively just needs to walk the index and find the maximum version for each container and id, stopping after 10 documents / groups. Instead, the aggregation fails because it groups every container and id, and there are ~20M groups in the DB.

Comment by Guido Imperiale [ 25/Jun/19 ]

A use case where this would give a 100x speed boost is described in SERVER-41891

Comment by John Watts [ 05/Nov/18 ]

I agree that I should not need to spill to disk (external sort) but the fact is that I do. And as I understand it, that is the issue I'm commenting on. The $group stage could know that its input is sorted because it used an index but it does not take advantage of that fact to avoid blocking the pipeline and consuming a bunch of memory:

 

> db.User.explain().aggregate({$match: {emailAddress: {$exists: true}}}, {$limit: 1000000}, {$group: {_id: "$emailAddress", count: {$sum: 1}}})

...

"winningPlan" : { "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" :

{ "emailAddress" : 1 }

...

> db.User.aggregate({$match: {emailAddress: {$exists: true}}}, {$limit: 1000000}, {$group: {_id: "$emailAddress", count: {$sum: 1}}})

assert: command failed:

{ "ok" : 0, "errmsg" : "Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in.", "code" : 16945, "codeName" : "Location16945" }

 

 

Regarding "eventually-consistent" you are right of course. I was using completely the wrong term. What I meant was that only the _id index of a sharded collection can be a unique index https://docs.mongodb.com/v3.6/core/sharding-shard-key/#sharding-shard-key-unique

Maintaining an invariant in your application such as "users, keyed by user id, also have unique email addresses" requires the application to do maintenance work which is made more difficult because $group does not take advantage of its input being sorted.

Comment by Asya Kamsky [ 09/Oct/18 ]

jwatts I'm not sure I understand your case or how it relates to this ticket. Index would be used in $match in 1. and the documents would already be sorted on that field so group would be already taking advantage of it. This ticket is tracking the work where $group would not get the data in sorted order. It's also not clear that you would need to spill to disk here as the data will be coming in already sorted. 2. seems to describe the client doing equivalent work I guess.

> Deduplicating on an indexed field is an expected use case, especially since Mongo is eventually-consistent.

I'm not sure what you mean by eventually-consistent - MongoDB actually isn't eventually consistent (you see the data in the same state/order on all nodes, unlike in eventually consistent systems where updates on different fields/records may arrive to different nodes in different order). I'm also not clear how this relates to this particular ticket.

Comment by John Watts [ 27/Sep/18 ]

Here's a use case I just encountered where I would expect the performance improvement to be significant. I have a large collection and I want to find duplicates on an indexed field that is only sometimes populated. As I understand it, I have the following options:

  1. Use the index in a $match to load the documents with the field populated, then $group and $match only count > 1. Spill to disk for sort.
  2. Use a find on the index and return all the documents that have the field, in order and skip the (many) which are not actually duplicates.

Deduplicating on an indexed field is an expected use case, especially since Mongo is eventually-consistent.

Comment by Asya Kamsky [ 14/Mar/17 ]

First, let me say that we are doing a lot of work on improving performance of all stages in aggregation pipeline.

However, as the description says, this ticket is only tracking $group using less resources (memory) and being more efficient when its input is already sorted by the group key. Our tests show that this improvement alone does not provide a significant performance improvement in most use cases.

There are other tickets that track performance improvements. Index selection is covered by SERVER-20066. Currently when aggregation has $group stage first, we will do a collection scan rather than considering an index scan when sometimes index may provide (covered) results in sorted order for $group to take advantage of. Note that we also implemented support for hints in aggregation SERVER-27848 which adds a way to "tell" the aggregation to use a particular index.

Our apologies for circular nature of SERVER-23318 and SERVER-4507. This was a bookkeeping error on our end. The remaining work required to implement a non-blocking $group operator is tracked by this ticket, not by SERVER-23318.

Comment by Oleg Rekutin [ 10/Mar/17 ]

Disappointed that this was moved out of 3.4 and into 3.6.

This is far more than a mere optimization, it's a major performance feature. Solving this issue opens up a new class of data design options, which we cannot employ right now due to inability to performantly aggregate over these data designs.

For one example, in MMAP, one could get away with larger documents, the updates to which were often applied in-place. However, with WiredTiger, it is no longer doing in-place updates of objects, but doing complete document overwrites for any change. But some of the options to break down the larger documents are impractical due to memory constraints, due to lack of SERVER-4507.

Finally, I see that the summary is circular. The quote above says "disabled until SERVER-23318 has been completed", but SERVER-23318 has been closed as essentially a duplicate of this very issue, SERVER-4507.

It would be very helpful if someone could please provide some hints on the remaining work areas to be done here?

Comment by Ramon Fernandez Marina [ 27/Jul/16 ]

The description of this ticket had the following line on it:

This functionality has been committed, but is disabled until SERVER-23318 has been completed.

This essentially means that this ticket should not have been resolved, as it doesn't provide the functionality it advertises. I'm therefore reopening this ticket so the necessary work on it can be completed.

Comment by Githook User [ 24/Mar/16 ]

Author:

{u'username': u'benjaminmurphy', u'name': u'Benjamin Murphy', u'email': u'benjamin_murphy@me.com'}

Message: SERVER-4507 Group stages now take advantage of sorted input sequences.
Branch: master
https://github.com/mongodb/mongo/commit/f4bbde02bab191cdba4195ec9ad73c60d4aece41

Comment by Chris Westin [ 29/Mar/12 ]

Note that in sharded setups, this could be complicated to merge on mongos. The behavior will have to be similar to using a merge sort there, as the sizes of buckets and their existence or not on different shards will vary.

Generated at Thu Feb 08 03:06:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.