[SERVER-34453] aggregation $count underperforms count() Created: 13/Apr/18 Updated: 27/Oct/23 Resolved: 25/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 3.7.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | tony kerz | Assignee: | William Byrne III |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
i have a collection with 10m rows. this returns in like 10ms:
while this returns like 10s:
doesn't seem to be related to https://jira.mongodb.org/browse/SERVER-7568, but similar in marked performance diff doing similar ops in aggregation and non-aggregation styles. |
| Comments |
| Comment by William Byrne III [ 01/May/18 ] |
|
Hi Tony, The reason for these performance differences is that a count() without a predicate doesn't fetch the matching documents and actually count them, it uses the metadata for the collection to return the number count of documents that would match an equivalent find() query. In contrast, the $count aggregation stage iterates through the documents passed to it from the prior stage (or from the collection/index scan if it is the first stage) and counts them. The equivalent non-aggregation command for $count is actually itcount, as it also iterates through the matching documents. You will find $count and itcount() have similar performance. Note also that while just reading the collection metadata as count() does is faster, it can return inaccurate counts for sharded clusters if there are orphaned documents or ongoing chunk migrations. Both $count and itcount() queries pull the documents to be counted through the SHARDING_FILTER, which eliminates duplicated documents from failed and ongoing migrations, so their results are accurate. Regards, William Byrne III |