Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-85337

$bucketAuto $first does not obey input order

    • Query Optimization
    • Fully Compatible
    • ALL
    • QO 2024-02-05, QO 2024-02-19, QO 2024-03-04
    • 110

      If I sort by {a: -1} and use $first, I should get the largest 'a' value within each group.

      For example, using $bucket I get the expected result:

      > db.c.find()
      { "_id" : 0, "a" : 0 }
      { "_id" : 1, "a" : 1 }
      { "_id" : 2, "a" : 2 }
      { "_id" : 3, "a" : 3 }
      { "_id" : 4, "a" : 4 }
      { "_id" : 5, "a" : 5 }
      { "_id" : 6, "a" : 6 }
      { "_id" : 7, "a" : 7 }
      { "_id" : 8, "a" : 8 }
      { "_id" : 9, "a" : 9 }
      
      > db.c.aggregate([ {$sort: {a: -1}}, {$bucket: {groupBy: "$a", boundaries: [0, 5, 10], output: {a: {$first: "$a"}} }} ])
      { "_id" : 0, "a" : 4 }
      { "_id" : 5, "a" : 9 }
      

      While the same thing with $bucketAuto gives the smallest value within each group:

      > db.c.aggregate([ {$sort: {a: -1}}, {$bucketAuto: {groupBy: "$a", buckets: 2, output: {a: {$first: "$a"}} }} ])
      { "_id" : { "min" : 0, "max" : 5 }, "a" : 0 }
      { "_id" : { "min" : 5, "max" : 9 }, "a" : 5 }
      

      It looks like internally $bucketAuto is doing something like:

      • stable-sort by the group key
      • scan the sorted input
        • accumulate each document into the current bucket
        • start a new bucket every N documents (based on input size / num buckets)

      But this doesn’t work for order-sensitive accumulators like $first, because the sort loses some information about the original order--even using a stable sort. The documents within each dynamically-chosen bucket are sorted by (group key, input position), instead of by input position.

            Assignee:
            chii.huang@mongodb.com Chi-I Huang
            Reporter:
            david.percy@mongodb.com David Percy
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: