[SERVER-23815] Add $bucket aggregation stage Created: 19/Apr/16  Updated: 05/Dec/16  Resolved: 07/Jul/16

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 3.3.10

Type: New Feature Priority: Major - P3
Reporter: Charlie Swanson Assignee: Sally McNichols
Resolution: Done Votes: 0
Labels: stage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by CSHARP-1716 Add support for $bucket aggregation s... Closed
is depended on by JAVA-2256 Add builder for $bucket aggregation s... Closed
Documented
is documented by DOCS-9493 Docs for SERVER-23815: Add $bucket ag... Closed
Related
related to DRIVERS-297 Aggregation Framework Support for 3.4 Closed
is related to SERVER-1635 Faceting support Closed
Backwards Compatibility: Fully Compatible
Sprint: Query 16 (06/24/16), Query 17 (07/15/16)
Participants:

 Description   

Syntax

{
  $bucket: {
    groupBy: <arbitrary expression>,
    boundaries: [
      // Array of constant values.
    ],
    default: <constant expression>,
    output: {  // Still optional, defaults to {count: {$sum: 1}}
      fieldName1: <accumulator 1>,
      fieldName2: <accumulator 2>
    }
  }
}

Examples

> db.example.insert([
  {_id: 0, screenSize: 30},
  {_id: 1, screenSize: 24},
  {_id: 2, screenSize: 42},
  {_id: 3, screenSize: 22},
  {_id: 4, screenSize: 55},
]);
> db.example.aggregate([
  {
    $bucket: {
      groupBy: "$screenSize",
      boundaries: [0, 24, 32, 50, Infinity],  // Makes 4 buckets, the _id is based on the min:
                                              // [0, 24), [24, 32), [32, 50), [50, Infinity)
      default: "other",
      output: {
        count: {$sum: 1},
        matches: {$push: "$screenSize"}
      }
    }
  }
])
{_id: 0, count: 1, matches: [22]}
{_id: 24, count: 2, matches: [24, 30]}
{_id: 32, count: 1, matches: [42]}
{_id: 50, count: 1, matches: [55]}
{_id: "other", count: 2, matches: [null, "not-a-number"]} 
 
// =====================  Example #2 - Error on unexpected types ==========================
> db.example.insert([
  {_id: 0, screenSize: 30},
  {_id: 1, screenSize: 24},
  {_id: 2, screenSize: 42},
  {_id: 3, screenSize: 22},
  {_id: 4, screenSize: 55},
  {_id: 5, screenSize: "not-a-number"},
  {_id: 6, screenSize: null},
]);
> db.example.aggregate([
  {
    $bucket: {
      groupBy: "$screenSize",
      boundaries: [0, 24, 32, 50, Infinity],
      // No default specified.
    }
  }
])
Error!
 
// =====================  Example #3 - Group unexpected types together =====================
> db.example.insert([
  {_id: 0, screenSize: 30},
  {_id: 1, screenSize: 24},
  {_id: 2, screenSize: 42},
  {_id: 3, screenSize: 22},
  {_id: 4, screenSize: 55},
  {_id: 5, screenSize: "not-a-number"},
  {_id: 6, screenSize: null},
]);
> db.example.aggregate([
  {
    $bucket: {
      groupBy: "$screenSize",
      boundaries: [0, 24, 32, 50, Infinity],
      default: "other",
      output: {
        count: {$sum: 1},
        matches: {$push: "$screenSize"}
      }
    }
  }
])
{_id: 0, count: 1, matches: [22]}
{_id: 24, count: 2, matches: [24, 30]}
{_id: 32, count: 1, matches: [42]}
{_id: 50, count: 1, matches: [55]}
{_id: "other", count: 2, matches: [null, "not-a-number"]} 

Detailed Behavior

  • This stage is very similar to the following:

    {
      $group: {
        _id: <some sort of $switch expression on the 'groupBy' to put it into buckets based on 'boundaries'>,
        fieldName1: <accumulator 1>
        fieldName2: <accumulator 2>
      }
    }
    

  • 'boundaries' must be constant values (can't use "$x", but can use {$add: [4, 5]}), and must be sorted.
  • Output documents will be sorted (in the order entered).
  • Values that are equal to a boundary will go into the range with that boundary as the minimum.
  • For example, with boundaries [0, 2, 4], 2 would go into the bucket with an _id of 2, which would encompass the range [2, 4)
  • The only expected use cases are numeric values and date-like types, but we'll support arbitrary values.
  • This will have to follow the operation's collation when comparing strings.
  • There is no way to assign labels to the buckets, the _id of the output document will be the minimum value for that bucket's range.


 Comments   
Comment by Charlie Swanson [ 21/Jul/16 ]

Before we open another ticket for this, I'd like to add that buzz.moschetti's request was considered as part of the design process, but considered unnecessary. With the addition of the $switch expression in SERVER-10689, it's not that hard to construct your own $group stage to create non-consecutive buckets. This approach should allow full customization of bucket definitions.

Comment by Buzz Moschetti [ 20/Jul/16 ]

Is the interface to $bucket cast in stone? In particular, non-contiguous buckets, like 0-10, 20-100, 1000+?
Attached above is some bucketing code I use to help build ... buckets. Perhaps it can serve as a mild requirements "spec" (term VERY loosely applied here).

EDIT Just saw this is closed.

Comment by Githook User [ 07/Jul/16 ]

Author:

{u'username': u'smcnichols', u'name': u'Sally McNichols', u'email': u'sally.mcnichols@mongodb.com'}

Message: SERVER-23815 add $bucket aggregation stage
Branch: master
https://github.com/mongodb/mongo/commit/d7bae36fa7ceabda22946903a6dee2b895d24559

Generated at Thu Feb 08 04:04:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.