Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-58809

Optimize $unwind + $group _id, to avoid blocking/spilling

    XMLWordPrintableJSON

Details

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Major - P3 Major - P3
    • None
    • None
    • None
    • None
    • Query Execution

    Description

      Currently this is an antipattern:

      {$unwind: "a"}
      {$match ...}
      {$group: {_id: "$_id", ...}}

      because $group is a blocking stage, and can spill if the data is big enough. We recommend something like this instead:

      {$set: {a: {$filter ...}}}

      This performs better because it operates on one document at a time.

      But the first version is nicer in some ways:

      • You can easily view intermediate results:
        • by commenting out stages,
        • or in Compass.
      • You might not need to learn two versions of every operator ($match/$filter, $addFields/$map, $group/$reduce).

      We could make it perform better by doing a streaming group (in this narrow case).

      • Streaming $group is valid when documents are clustered by the group key.
      • Documents in a collection are clustered by _id (because we have a unique, non-multikey index on _id).
      • $unwind preserves this (if it unwinds one document at a time).
      • $match preserves this.
      • $project/$set can preserve this, depending on which paths they write.

      Attachments

        Activity

          People

            backlog-query-execution Backlog - Query Execution
            david.percy@mongodb.com David Percy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: