[SERVER-58784] Do the $setUnion, $setDifference, and $setIntersection operations have any guarantees around order? Created: 22/Jul/21  Updated: 27/Oct/23  Resolved: 30/Aug/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: David Storch Assignee: Jennifer Peshansky (Inactive)
Resolution: Works as Designed Votes: 0
Labels: sbe-diff
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Sprint: QE 2021-09-06
Participants:
Linked BF Score: 135

 Description   

The names imply that these are set operations, which means that the order of the output array can be arbitrary. However, the implementation in the Expression class will always produce particular output orders. In particular, the output order for $setDifference is given by the order of the input array:

MongoDB Enterprise > db.c.drop()
MongoDB Enterprise > db.c.insert({lhs: [5, 4, 3, 2, 1], rhs: [4, 2]})
 
// This will always produce 5, 3, 1 due to the ordering of "$lhs".
MongoDB Enterprise > db.c.aggregate([{$project: {out: {$setDifference: ["$lhs", "$rhs"]}}}])
{ "_id" : ObjectId("60f9cc394f0e6d565685249e"), "out" : [ 5, 3, 1 ] }

The $setUnion expression always outputs a sorted array:

MongoDB Enterprise > db.c.drop()
MongoDB Enterprise > db.c.insert({lhs: [8, 2, 6], rhs: [4, 2, 1, 7]})
MongoDB Enterprise > db.c.aggregate([{$project: {out: {$setUnion: ["$lhs", "$rhs"]}}}])
{ "_id" : ObjectId("60f9cca54f0e6d565685249f"), "out" : [ 1, 2, 4, 6, 7, 8 ] }

And the $setIntersection expression also always returns a sorted array:

MongoDB Enterprise > db.c.drop()
MongoDB Enterprise > db.c.insert({lhs: [8, 2, 6, 1], rhs: [4, 2, 1, 7, 6]})
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise > db.c.aggregate([{$project: {out: {$setIntersection: ["$lhs", "$rhs"]}}}])
{ "_id" : ObjectId("60f9cd204f0e6d56568524a0"), "out" : [ 1, 2, 6 ] }

SBE's implementation, on the other hand, provides no such guarantees. It uses an unordered flat hash set for the output array under the hood, which means the output array has an arbitrary order. Is this acceptable given the lack of ordering guarantees for the set expressions, or should we change the implementation in SBE to produce arrays ordered in the same way as the classic engine would?



 Comments   
Comment by Jennifer Peshansky (Inactive) [ 30/Aug/21 ]

There are no guarantees around order for set operations, so this behavior difference is acceptable. No follow-up work is required, because the fuzzer already ignores the order of output arrays.

Comment by Kyle Suarez [ 19/Aug/21 ]

Eric is on vacation next week, so moving this over to Jenny for next sprint.

Generated at Thu Feb 08 05:45:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.