[SERVER-31234] Hyperloglog Counting Created: 24/Sep/17  Updated: 20/Oct/23

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Aayush Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 7
Labels: expression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:

 Description   

The use case is to count the number of distinct elements where the set size is very large, and we need approximate carnality

Presently to count the number of distinct elements in a set while grouping there are two ways-

  1. $addToSet followed by $size.
  2. $group with the element in _id followed by another $group stage which collects and counts all such documents.

The first approach has a problem that the 16MB document size limit may be reached pretty fast. The second approach has a lot of memory overhead and thus is very slow.

A hyperloglog based approach would help reduce the overheads and probably will be faster.



 Comments   
Comment by Arturs Sosins [ 23/Feb/23 ]

It does not have to be "the" cardinality counting pipeline operator. It could be literally $hyperloglog operator/stage to do this one specific thing 

To get users per day

{
  $group:
    {
      _id: "$date",
      users: { $hll_add : "$user_id" }
    }
}

get the count for each day

{
  $projection:
   {
    _id: "$_id",
    users_count: { $hll_count : "$users" }
   }
} 

or merge multiple sets to get the total count

{
   $group:
    {
     _id: "null"
     total_users:  { $hll_merge : "$users" },
    }
}

 

 

Comment by apocarteres [ 16/Nov/17 ]

i guess it's not possible unless MongoDB will be supporting plugins. Having implemented cardinality counting with hardcoded HLL+ is going to produce backward compatibility issues in case MongoDB will decided to pickup something else to count cardinality in future.

Comment by Kelsey Schubert [ 25/Sep/17 ]

Hi hyades,

Thank you for the feature request; I've marked it for consideration. Please continue to watch this ticket for updates.

Kind regards,
Kelsey

Generated at Thu Feb 08 04:26:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.