-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Execution
-
Fully Compatible
On my workstation, running
db.test.aggregate([{$group: {_id: null, total: {$sum: "$a"}}}])
on 16 million {a:2} documents takes about 10 seconds. The attached flamegraph shows that roughly 4% of the perf samples collected occur in the hash function for HashAggStage's _ht hash table.
Given this query is guaranteed to return only one document, there is no need for the hash table in the aggregation. It probably makes sense to have a new plan stage, to separate the code from HashAggStage and to make explain output clear.
Getting the collection filled with documents can take some time, I found this python snippet to be helpful
for (let i = 0; i < 1000*4; i++) { db.bar.insertMany([' + ','.join(['{a:2}' for _ in range(256)]) + "])}