[SERVER-36625] Add preserveNull and preserveEmptyArrays as $unwind options Created: 13/Aug/18 Updated: 27/Oct/18 Resolved: 27/Oct/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Minor - P4 |
| Reporter: | Patrick Meredith | Assignee: | Asya Kamsky |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Participants: | |||||
| Description |
|
There are many cases in the BI-Connector (and presumably in customer usage) where the distinction between empty and null array fields in an $unwind must be distinguished, e.g., when mimicking SQL left joins using $unwinds. Adding these two options would allow for easy distinguishing. For the BI-Connector, we really only need preserveNull, but having the symmetric preserveEmptyArrays could utility. |
| Comments |
| Comment by Asya Kamsky [ 09/Oct/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
patrick.meredith
before the $unwind with preserveNullAndEmptyArrays I'm not sure that any option that we add to the server will make your pipeline any faster (or simpler). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 28/Sep/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
First comment said: > A work-around for this right now is adding a $match before the $unwind, but this has performance implications. What is it that would be being matched before $unwind? Is this ticket about $unwind or is it about $lookup? i.e. if $lookup had an option to ignore null in local documents would that address your use case? > A left join between a parent document and the array inside of it Is the use case here a join ($lookup followed by $unwind) or any $unwind?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 27/Sep/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I don’t understand why the simplest thing isn’t to do appropriate $filter of the array to make sure you get the desired results from the $unwind. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Patrick Meredith [ 14/Sep/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Perhaps the best thing would be "preserveRootDocuments": something that preserves one copy of the root document with a null entry for the array field. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Patrick Meredith [ 14/Sep/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
A left join between a parent document and the array inside of it, which is rendered as two separate SQL tables in our mapping. Actually, to be fair, this also applies to $lookup joins since we $unwind those as well. In either case, we want to keep the empty array, but remove null values from the array because the semantics of a SQL left join are such that you have at least one copy of every row on the left side: this is equivalent to not dropping documents that contain the empty array. The problem I am seeing now is how to cleanly support [null] because the semantics we actually need are to keep at least one copy of the original document regardless of array contents, but to keep all of the non-NULL array contents. Ultimately, it might make sense to have a mode that somehow alludes to having left join semantics. Anyway, examples: The following two tables are derived from one collection, the second table is an array that exists across the documents, consult the _id column to know which documents correspond:
So, we can optimize these to a simple unwind in a join where the _ids are constrained to be the same:
The resulting pipeline is:
The addfields here is so that we can correctly generate the _ids on the rhs that should be NULL.
One field is removed; because we can't rely on preserveNullAndEmpty, we have to construct this leftjoin_exclude filter. Now watch what happens if we change this such that the rhs for all _id=2 disappear:
Because we need the at least one copy of each lhs. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 31/Aug/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I'm confused because you say: > Nulls inside an array should be dropped But this option is only relevant to array having types missing, null or explicitly []. If you have [ null ] it will not be affected by $unwind option. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 31/Aug/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
patrick.meredith when emulating a join with $unwind – are you referring to underlying collection having an array in a field that needs to be flattened? And you want to eliminate arrays which are null or field not present and keep ones that have explicit []? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Patrick Meredith [ 24/Aug/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It’s not for $lookup, it’s for mimicking left joins with unwind. Nulls inside an array should be dropped, but the empty array must be kept. So this is actually the argument for preserveEmptyArrays. For mapping polymoprhic data as a SQL schema, we want preserveNull, however, and it’s also good to have ful symmetry. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ian Whalen (Inactive) [ 24/Aug/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
ping patrick.meredith | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 17/Aug/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
patrick.meredith can you give an example for BI Connector - as far as I can tell when you do any $lookup you get an empty array and you can never get null - why is the existing option not adequate? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Patrick Meredith [ 13/Aug/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
A work-around for this right now is adding a $match before the $unwind, but this has performance implications. |