Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-37242

$group on sort key (after $sort) could be optimized to avoid blocking

    • Type: Icon: Improvement Improvement
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Aggregation Framework
    • None
    • Query

      $group seems to be a blocking stage regardless of the conditions. Trivial groupings on keys which are already sorted can avoid blocking to some degree and definitely use less memory.


      For example, using the "tweets" data set from the University with an index on "user.screen_name":

      >db.tweets.aggregate( [ 
       { $sort: { "user.screen_name": 1 } },
       { $group: { _id: "$user.screen_name", tweets: { $push: "$$CURRENT" } } }

      Fails with:

      "Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in."

      The results going in to the group are already sorted. Every new value consumed indicates the previous value can never be seen again – the previous value bucket could be emitted and the group begin a new bucket with no need to block.

      Notably this would reduce the memory footprint of the $group stage to prevent it, in these cases, from ever exceeding the limit.


            backlog-server-query Backlog - Query Team (Inactive)
            andrew.ryder@mongodb.com Andrew Ryder (Inactive)
            0 Vote for this issue
            7 Start watching this issue