The use case is to count the number of distinct elements where the set size is very large, and we need approximate carnality
Presently to count the number of distinct elements in a set while grouping there are two ways-
- $addToSet followed by $size.
- $group with the element in _id followed by another $group stage which collects and counts all such documents.
The first approach has a problem that the 16MB document size limit may be reached pretty fast. The second approach has a lot of memory overhead and thus is very slow.
A hyperloglog based approach would help reduce the overheads and probably will be faster.