Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-58784

Do the $setUnion, $setDifference, and $setIntersection operations have any guarantees around order?

    XMLWordPrintable

    Details

    • Type: Question
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Works as Designed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Sprint:
      QE 2021-09-06

      Description

      The names imply that these are set operations, which means that the order of the output array can be arbitrary. However, the implementation in the Expression class will always produce particular output orders. In particular, the output order for $setDifference is given by the order of the input array:

      MongoDB Enterprise > db.c.drop()
      MongoDB Enterprise > db.c.insert({lhs: [5, 4, 3, 2, 1], rhs: [4, 2]})
       
      // This will always produce 5, 3, 1 due to the ordering of "$lhs".
      MongoDB Enterprise > db.c.aggregate([{$project: {out: {$setDifference: ["$lhs", "$rhs"]}}}])
      { "_id" : ObjectId("60f9cc394f0e6d565685249e"), "out" : [ 5, 3, 1 ] }
      

      The $setUnion expression always outputs a sorted array:

      MongoDB Enterprise > db.c.drop()
      MongoDB Enterprise > db.c.insert({lhs: [8, 2, 6], rhs: [4, 2, 1, 7]})
      MongoDB Enterprise > db.c.aggregate([{$project: {out: {$setUnion: ["$lhs", "$rhs"]}}}])
      { "_id" : ObjectId("60f9cca54f0e6d565685249f"), "out" : [ 1, 2, 4, 6, 7, 8 ] }
      

      And the $setIntersection expression also always returns a sorted array:

      MongoDB Enterprise > db.c.drop()
      MongoDB Enterprise > db.c.insert({lhs: [8, 2, 6, 1], rhs: [4, 2, 1, 7, 6]})
      WriteResult({ "nInserted" : 1 })
      MongoDB Enterprise > db.c.aggregate([{$project: {out: {$setIntersection: ["$lhs", "$rhs"]}}}])
      { "_id" : ObjectId("60f9cd204f0e6d56568524a0"), "out" : [ 1, 2, 6 ] }
      

      SBE's implementation, on the other hand, provides no such guarantees. It uses an unordered flat hash set for the output array under the hood, which means the output array has an arbitrary order. Is this acceptable given the lack of ordering guarantees for the set expressions, or should we change the implementation in SBE to produce arrays ordered in the same way as the classic engine would?

        Attachments

          Activity

            People

            Assignee:
            jennifer.peshansky Jennifer Peshansky
            Reporter:
            david.storch David Storch
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: