[SERVER-31831] Improve aggregation set operations for array of objects Created: 03/Nov/17 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Joel Goldfinger | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query Optimization
|
||||||||
| Participants: | |||||||||
| Description |
|
Currently, it is hard to compute stats / transform data on a nested array of objects where a subset of the fields make up the key (hash) of the object. It is possible to do via $unwind and $group, but that is an issue when operating on multiple fields in the documents at the same time. The other way is to $map using $concat on the key fields -> $setUnion -> $map -> $reduce using $cond with $in, but that is way too slow. It would be helpful if the set operations allowed specifying the comparison function. When providing a custom comparison function, the $setUnion, $setIntersection, and $setDifference could have a mandatory reduce function to merge duplicates.
|
| Comments |
| Comment by Asya Kamsky [ 16/Dec/17 ] | |||||||||||||||||||||||||||||||||
|
Agreed that internal implementation can be more efficient. | |||||||||||||||||||||||||||||||||
| Comment by Joel Goldfinger [ 04/Dec/17 ] | |||||||||||||||||||||||||||||||||
|
It works, but it is pretty slow on a large data set. It would probably be faster if the items could be stored in a hash instead of an array. As implemented, it has to scan the entire initialValue array for each item in the reduce input. Thanks. | |||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 10/Nov/17 ] | |||||||||||||||||||||||||||||||||
|
I agree that $unwind and $group is definitely not the right way to do this as it would be very slow. I'm also not sure if your sample code does what you describe exactly - I think you want to get a set of all unique "a, b" values and then do something else with the leftover field, like add it, right? This can already be done with $reduce, in other words, replace $setXXXX with $reduce in your proposed syntax and use $let to assign the appropriate element to variables and do calculations. Something like the following:
Does this do what the syntax you are proposing would do? If I misunderstood what you are trying to do, please let me know. Asya | |||||||||||||||||||||||||||||||||
| Comment by Mark Agarunov [ 06/Nov/17 ] | |||||||||||||||||||||||||||||||||
|
Hello devnopt, Thank you for the detailed example. I've set the fixVersion to "Needs Triage" for this new feature to be scheduled against our currently planned work. Updates will be posted on this ticket as they happen. Thanks, |