Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-87719

SBE group by constant optimization

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 8.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Execution
    • Fully Compatible

      flamegraph.svg

      On my workstation, running

      db.test.aggregate([{$group: {_id: null, total: {$sum: "$a"}}}]) 

      on 16 million {a:2} documents takes about 10 seconds. The attached flamegraph shows that roughly 4% of the perf samples collected occur in the hash function for HashAggStage's _ht hash table.

      Given this query is guaranteed to return only one document, there is no need for the hash table in the aggregation. It probably makes sense to have a new plan stage, to separate the code from HashAggStage and to make explain output clear.

      Getting the collection filled with documents can take some time, I found this python snippet to be helpful

      for (let i = 0; i < 1000*4; i++) { db.bar.insertMany([' + ','.join(['{a:2}' for _ in range(256)]) + "])} 

            Assignee:
            evan.bergeron@mongodb.com Evan Bergeron
            Reporter:
            evan.bergeron@mongodb.com Evan Bergeron
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: