Uploaded image for project: 'Java Driver'
  1. Java Driver
  2. JAVA-2256

Add builder for $bucket aggregation stage

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.4.0
    • Component/s: Builders
    • Labels:
      None
    • Sprint:
      JVM Sprint 35

      Description

      Syntax

      {
        $bucket: {
          groupBy: <arbitrary expression>,
          boundaries: [
            // Array of constant values.
          ],
          default: <constant expression>,
          output: {  // Still optional, defaults to {count: {$sum: 1}}
            fieldName1: <accumulator 1>,
            fieldName2: <accumulator 2>
          }
        }
      }
      

      Examples

      > db.example.insert([
        {_id: 0, screenSize: 30},
        {_id: 1, screenSize: 24},
        {_id: 2, screenSize: 42},
        {_id: 3, screenSize: 22},
        {_id: 4, screenSize: 55},
      ]);
      > db.example.aggregate([
        {
          $bucket: {
            groupBy: "$screenSize",
            boundaries: [0, 24, 32, 50, Infinity],  // Makes 4 buckets, the _id is based on the min:
                                                    // [0, 24), [24, 32), [32, 50), [50, Infinity)
            output: {
              count: {$sum: 1},
              matches: {$push: "$screenSize"}
            }
          }
        }
      ])
      {_id: "other", count: 2, matches: [null, "not-a-number"]} 
      {_id: 0, count: 1, matches: [22]}
      {_id: 24, count: 2, matches: [24, 30]}
      {_id: 32, count: 1, matches: [42]}
      {_id: 50, count: 1, matches: [55]}
       
      // =====================  Example #2 - Error on unexpected types ==========================
      > db.example.insert([
        {_id: 0, screenSize: 30},
        {_id: 1, screenSize: 24},
        {_id: 2, screenSize: 42},
        {_id: 3, screenSize: 22},
        {_id: 4, screenSize: 55},
        {_id: 5, screenSize: "not-a-number"},
        {_id: 6, screenSize: null},
      ]);
      > db.example.aggregate([
        {
          $bucket: {
            groupBy: "$screenSize",
            boundaries: [0, 24, 32, 50, Infinity],
            // No default specified.
          }
        }
      ])
      Error!
       
      // =====================  Example #3 - Group unexpected types together =====================
      > db.example.insert([
        {_id: 0, screenSize: 30},
        {_id: 1, screenSize: 24},
        {_id: 2, screenSize: 42},
        {_id: 3, screenSize: 22},
        {_id: 4, screenSize: 55},
        {_id: 5, screenSize: "not-a-number"},
        {_id: 6, screenSize: null},
      ]);
      > db.example.aggregate([
        {
          $bucket: {
            groupBy: "$screenSize",
            boundaries: [0, 24, 32, 50, Infinity],
            default: "other",
            output: {
              count: {$sum: 1},
              matches: {$push: "$screenSize"}
            }
          }
        }
      ])
      {_id: "other", count: 2, matches: [null, "not-a-number"]} 
      {_id: 0, count: 1, matches: [22]}
      {_id: 24, count: 2, matches: [24, 30]}
      {_id: 32, count: 1, matches: [42]}
      {_id: 50, count: 1, matches: [55]}
      

      Detailed Behavior

      • This stage is very similar to the following:

        {
          $group: {
            _id: <some sort of $switch expression on the 'groupBy' to put it into buckets based on 'boundaries'>,
            fieldName1: <accumulator 1>
            fieldName2: <accumulator 2>
          }
        }
        

      • 'boundaries' must be constant values (can't use "$x", but can use {$add: [4, 5]}), and must be sorted.
      • Output documents will be sorted (in the order entered).
      • Values that are equal to a boundary will go into the range with that boundary as the minimum.
      • For example, with boundaries [0, 2, 4], 2 would go into the bucket with an _id of 2, which would encompass the range [2, 4)
      • The only expected use cases are numeric values and date-like types, but we'll support arbitrary values.
      • This will have to follow the operation's collation when comparing strings.
      • There is no way to assign labels to the buckets, the _id of the output document will be the minimum value for that bucket's range.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: