[SERVER-20168] Add an option to $unwind to include documents with empty arrays Created: 27/Aug/15 Updated: 06/Nov/17 Resolved: 07/Oct/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 3.1.7 |
| Fix Version/s: | 3.2.0-rc0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Charlie Swanson | Assignee: | Charlie Swanson |
| Resolution: | Done | Votes: | 0 |
| Labels: | todo_in_code | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Major Change | ||||||||||||||||||||
| Sprint: | Quint 9 09/18/15, QuInt A (10/12/15) | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
The stage {$unwind: '$field'} currently produces no output documents for an input with any of the following criteria:
The old syntax of {unwind: '$field'} should still be accepted as well. This would enable a $lookUp + $unwind to produce a stream of documents that is equivalent to a left outer join. Currently this is not possible as the $lookUp stage produces an empty array for documents which have no matches in the foreign collection. If one were to unwind the as field afterwards, such a non-matching document would not produce any results. e.g. If you have these collections:
And you use the following pipeline:
You'll get the following output:
Which does not include any output for the second document, and hence is not an outer join. If you replaced the $unwind stage with the proposed
You would instead get the following results:
It may also be desirable to specify a default value in this case rather than always using null |
| Comments |
| Comment by Githook User [ 13/Oct/15 ] | ||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}Message: | ||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 13/Oct/15 ] | ||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}Message: Adds logic to $lookUp to handle the preserveNullAndEmptyArrays parameter from | ||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 07/Oct/15 ] | ||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}Message: | ||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 07/Oct/15 ] | ||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}Message: | ||||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 06/Oct/15 ] | ||||||||||||||||||||||||||||||||||
|
There is a concern that preserveNullAndEmptyArrays is not really the correct name anymore, and perhaps we should use a different one, e.g. includeNullsAndEmptyArrays. I think that currently, the 'preserve' is meant to mean that the documents themselves will be preserved through this stage in the pipeline. That is, each document will have at least one output, none will be lost. Before, the 'preserve' could also be taken to mean 'preserve the whole document untouched through this stage'. So yes, the name is not as great a fit after the behavior change. However, I don't see a great reason to change the name this late in the game. True, the name is less applicable than it was before the adjustment of how empty arrays are handled, but most people will not even know of the previous behavior. The name is still descriptive of what happens, just with a different reading of 'preserve', and as long as the documentation is clear, there shouldn't be much confusion. I don't anticipate anyone being upset because they wanted and expected an empty array to be literally preserved. Perhaps a bit confused, but again, good documentation should mitigate this. Additionally, and this is more of an inkling, includeNullsAndEmptyArrays seems to me to imply that nulls and empty arrays within the array were not previously being included. preserveNullsAndEmptyArrays doesn't imply that to me. Not sure why though. | ||||||||||||||||||||||||||||||||||
| Comment by Justin Lee [ 02/Oct/15 ] | ||||||||||||||||||||||||||||||||||
|
Looks good to me, too. | ||||||||||||||||||||||||||||||||||
| Comment by Craig Wilson [ 02/Oct/15 ] | ||||||||||||||||||||||||||||||||||
|
Yes, This works for me. | ||||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 01/Oct/15 ] | ||||||||||||||||||||||||||||||||||
|
Re-opening to track the work of described in previous comments. We have decided to go with a slight modification of the changes described in this comment. That is, with the 'preserveNullAndEmptyArrays' option:
This will allow all non-array values to be treated similarly (A string will already pass through an unwind stage unchanged), but still solves the problem of having mixed types in the case of an empty array. So for complete clarity, here are the new intended behaviors:
cc craiggwilson, asya, justin.lee, this still works for your purposes, right? | ||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 01/Oct/15 ] | ||||||||||||||||||||||||||||||||||
|
In this example, I can see _id:3 resulting in either missing "a" or "a" being null. null is a little more consistent with _id 6 result. I think either one is okay. | ||||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 01/Oct/15 ] | ||||||||||||||||||||||||||||||||||
|
There's been some unintended consequences of the behavior of this option with empty arrays. As it is implemented, it makes it hard to infer the type of the field in the output document. For example, let's say you somehow knew/enforced that the field 'a' in every document was an array of strings (allowed to be empty):
Logically, if you performed the following aggregation:
You could correctly deduce that every resulting document to have a string value for the field 'a'. However, if you included the new option:
The resulting documents sometimes have a string for the field 'a', and sometimes have an array. This makes it difficult for a typed query language to handle the result. The intent of this new option was to ensure every document has at least 1 resulting document after an $unwind. How the nullish values appear in the output document does not appear to be critical for the use case it was intended to address. As such, we should adjust the behavior to make the above scenario have stronger type guarantees. In particular, in an $unwind with preserveNullAndEmptyArrays, all nullish fields (including empty arrays), should not appear in the output document. This would give the following behavior:
Documents would still be preserved in the sense that each input has at least one output, but the values of the nullish fields would not be preserved. | ||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 11/Sep/15 ] | ||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}Message: Adds a preserveNullAndEmptyArrays option to the $unwind stage. If this option Note this does not affect the behavior of non-nullish, non-array values. | ||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 11/Sep/15 ] | ||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}Message: Adds logic to $lookUp to handle the preserveNullAndEmptyArrays parameter from |