-
Type: New Feature
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
QO 2021-11-15, QO 2021-11-29, QO 2022-01-24, QO 2022-02-07, QO 2022-02-21
The $count stage works on both collections and views, but always gives an exact count. The $collStats stage can produce a faster, estimated count, but it only works on a collection. $collStats also returns other information that might not be relevant or easy to define for a view.
Let's introduce a new stage which:
- like $count, can run anywhere in a pipeline, on a collection or view.
- like $collStats, can provide a fast estimated count.
- lets the user decide whether to allow falling back to a slower, exact count.
Syntax could be something like:
{$approxCount: { as: <string>, errorWhenNoApproximation: <default to true>, }}
Implementation notes:
- This would be a new subclass of DocumentSource.
- getNext() can do whatever $collStats does to get the estimated count.
- optimizeAt() can eliminate stages that preserve count. [$project, $approxCount] becomes just [$approxCount].
- If an estimated count is not possible, and allowExact is true, then during optimization it can expand to the same $group stage that $count expands to. Then other optimizations may apply to the $group.
- On a sharded collection, $approxCount can run partially on the shards and partially on the merger. Each shard can run its own $approxCount, and the merger can $group $sum the results.