[SERVER-61332] Introduce $approxCount stage that can work on a view Created: 08/Nov/21  Updated: 22/Feb/22  Resolved: 18/Feb/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: David Percy Assignee: Matt Boros
Resolution: Won't Do Votes: 2
Labels: read-only-views
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: QO 2021-11-15, QO 2021-11-29, QO 2022-01-24, QO 2022-02-07, QO 2022-02-21
Participants:

 Description   

The $count stage works on both collections and views, but always gives an exact count. The $collStats stage can produce a faster, estimated count, but it only works on a collection. $collStats also returns other information that might not be relevant or easy to define for a view.

Let's introduce a new stage which:

  • like $count, can run anywhere in a pipeline, on a collection or view.
  • like $collStats, can provide a fast estimated count.
  • lets the user decide whether to allow falling back to a slower, exact count.

Syntax could be something like:

{$approxCount: {
    as: <string>,
    errorWhenNoApproximation: <default to true>,
}}

Implementation notes:

  • This would be a new subclass of DocumentSource.
  • getNext() can do whatever $collStats does to get the estimated count.
  • optimizeAt() can eliminate stages that preserve count. [$project, $approxCount] becomes just [$approxCount].
  • If an estimated count is not possible, and allowExact is true, then during optimization it can expand to the same $group stage that $count expands to. Then other optimizations may apply to the $group.
  • On a sharded collection, $approxCount can run partially on the shards and partially on the merger. Each shard can run its own $approxCount, and the merger can $group $sum the results.


 Comments   
Comment by James Wahlin [ 18/Feb/22 ]

ewan.higgs@deliverect.com this comment on SERVER-63850 describes the rationale behind this. The original motivation for this addition was to provide a stable-API supported means for our estimatedDocumentCount helper to return a count for non-materialized views. The mechanism for doing so for 5.0 had changed as we were not planning on adding the count command to the stable API. We have instead decided to add count to the stable API and will update the drivers to use.

For those currently on 5.0 and who would like the previous behavior, the workaround is to use the count command directly. A driver Client that has not specified API strict will be needed to do so.

Comment by Ewan Higgs [ 18/Feb/22 ]

Is there anywhere to track why this went from In Review to Won't do? I don't see a PR here https://github.com/mongodb/mongo/pulls?q=is%3Apr+SERVER-61332

Generated at Thu Feb 08 05:52:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.