-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Labels:None
Documentation Request Summary:
The page below will need to be updated:
https://docs.mongodb.com/manual/reference/command/count/
The following paragraphs should be changed:
---------------------------------------------------
On a sharded cluster, count can result in an inaccurate count if orphaned documents exist or if a chunk migration is in progress.
To avoid these situations, on a sharded cluster, use the $group stage of the db.collection.aggregate() method to $sum the documents. For example, the following operation counts the documents in a collection:
---------------------------------------------------
The new behavior is that, when running a sharded cluster, a fast count may return inaccurate results. A count() with a predicate will not (as of 4.0). See the "Behavior of "fast count" and non-"fast count" in the description of the ticket. If you have any questions feel free to slack/email/comment on a ticket!
Scope of changes:
- count (various)
Impact to other docs outside of this product:
none
MVP:
Resources:
Engineering Ticket Description:
Summary
Count does not filter out unowned (orphaned) documents and can therefore report larger values than one will find via a normal query, or using itcount() in the shell.
Causes
The following conditions can lead to counts being off:
- Active migrations
- Orphaned documents (left from failed migrations)
- Non-Primary read preferences (see
SERVER-5931)
Workaround
A workaround to get accurate counts is to ensure all migrations have been cleaned up and no migrations are active. To query non-primaries you must also ensure that there is no replication lag including any migration data, in addition to the above requirements.
Non-Primary Reads
For issues with counts/reads from non-primaries please see SERVER-5931
Behavior of "fast count" and non-"fast count"
A "fast count" is a count run without a predicate. It is "fast" because the implementation only reads the metadata, without fetching any documents.
The problem of count() reporting inaccurate results has been fixed for non-"fast counts," that is, starting in 4.0, counts which are run with a predicate are accurate when run on sharded clusters. "Fast counts" (count() run without a predicate) may still report too many documents (see SERVER-33753).
In general, if one needs an accurate count of how many documents are in a collection, we do not recommend using the count command. Instead, we suggest using the $count aggregation stage, like this:
db.foo.aggregate([{$count: "nDocs"}]);
See the docs.
For users who need the performance of "fast count", and are okay with approximate results, we suggest using $collStats instead of the count command:
db.matrices.aggregate( [ { $collStats: { count: { } } } ] )
- documents
-
SERVER-3645 Sharded collection counts (on primary) can report too many results
- Closed