Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-37242

$group on sort key (after $sort) could be optimized to avoid blocking

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Aggregation Framework
    • Labels:
      None

      Description

      $group seems to be a blocking stage regardless of the conditions. Trivial groupings on keys which are already sorted can avoid blocking to some degree and definitely use less memory.

       

      For example, using the "tweets" data set from the University with an index on "user.screen_name":

      >db.tweets.aggregate( [ 
       { $sort: { "user.screen_name": 1 } },
       { $group: { _id: "$user.screen_name", tweets: { $push: "$$CURRENT" } } }
      ])

      Fails with:

      "Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in."

      The results going in to the group are already sorted. Every new value consumed indicates the previous value can never be seen again – the previous value bucket could be emitted and the group begin a new bucket with no need to block.

      Notably this would reduce the memory footprint of the $group stage to prevent it, in these cases, from ever exceeding the limit.

       

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              backlog-server-query Backlog - Query Team
              Reporter:
              andrew.ryder Andrew Ryder
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: