[SERVER-20168] Add an option to $unwind to include documents with empty arrays Created: 27/Aug/15  Updated: 06/Nov/17  Resolved: 07/Oct/15

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.1.7
Fix Version/s: 3.2.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Charlie Swanson Assignee: Charlie Swanson
Resolution: Done Votes: 0
Labels: todo_in_code
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to CSHARP-1401 Add an option to $unwind to include d... Closed
related to JAVA-1957 Add an option to $unwind to include d... Closed
is related to DRIVERS-234 Aggregation Builder Support for 3.2 Closed
is related to SERVER-12685 Expand $unwind behavior to include em... Closed
Backwards Compatibility: Major Change
Sprint: Quint 9 09/18/15, QuInt A (10/12/15)
Participants:

 Description   

The stage {$unwind: '$field'} currently produces no output documents for an input with any of the following criteria:

  1. field is not present.
  2. field is present, but has a value of null.
  3. field is an empty array

SERVER-12685 is requesting that condition 1 can optionally produce output. This ticket is requesting that condition 3 can optionally produce output. For example, with a syntax like

{
  $unwind: {
    path: '$field',
    includeEmpty: <bool>
  }
}

The old syntax of {unwind: '$field'} should still be accepted as well.

This would enable a $lookUp + $unwind to produce a stream of documents that is equivalent to a left outer join. Currently this is not possible as the $lookUp stage produces an empty array for documents which have no matches in the foreign collection. If one were to unwind the as field afterwards, such a non-matching document would not produce any results.

e.g. If you have these collections:

products
{_id: 'tshirt', p: 12}
{_id: 'shorts', p: 30}
sales
{_id: 0, product_id: 'tshirt', q: 2}
{_id: 1, product_id: 'tshirt', q: 1}

And you use the following pipeline:

db.products.aggregate([
  {
    $lookUp: {
      from: 'sales',
      localField: '$_id',
      foreignField: '$product_id',
      as: 'sale'
  },
// Produces two documents:
// {_id: 'tshirt', p: 12, sale: [{_id: 0, ...}, {_id: 1, ...} ]}
// {_id: 'shorts', p: 30, sale: []}
  {
    $unwind: '$sale'
  }
]) 

You'll get the following output:

{_id: 'tshirt', p: 30, sale: {_id: 0, product_id: 0, q: 2}}
{_id: 'tshirt', p: 30, sale: {_id: 1, product_id: 0, q: 1}}

Which does not include any output for the second document, and hence is not an outer join.

If you replaced the $unwind stage with the proposed

{$unwind: {field: '$sale', includeEmpty: true}}

You would instead get the following results:

{_id: 'tshirt', p: 30, sale: {_id: 0, product_id: 0, q: 2}}
{_id: 'tshirt', p: 30, sale: {_id: 1, product_id: 0, q: 1}}
{_id: 'shorts', p: 12, sale: null}

It may also be desirable to specify a default value in this case rather than always using null



 Comments   
Comment by Githook User [ 13/Oct/15 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-20168 Empty array should not be present in document after $unwind
Branch: artree
https://github.com/10gen/mongo-enterprise-modules/commit/02d05965c425b95c7cf56110b3e8a4b64c916526

Comment by Githook User [ 13/Oct/15 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-20168 Add option to $unwind stage for treatment of nullish values

Adds logic to $lookUp to handle the preserveNullAndEmptyArrays parameter from
the $unwind stage if the $lookUp stage had coalesced with an $unwind stage.
Branch: artree
https://github.com/10gen/mongo-enterprise-modules/commit/d1e4c10271c3231de10fcc6756a400a130536c88

Comment by Githook User [ 07/Oct/15 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-20168 Empty array should not be present in document after $unwind
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/02d05965c425b95c7cf56110b3e8a4b64c916526

Comment by Githook User [ 07/Oct/15 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-20168 Empty array should not be present in document after $unwind
Branch: master
https://github.com/mongodb/mongo/commit/c7ddaa120d2041ce600d504b1bae149353b80fdc

Comment by Charlie Swanson [ 06/Oct/15 ]

There is a concern that preserveNullAndEmptyArrays is not really the correct name anymore, and perhaps we should use a different one, e.g. includeNullsAndEmptyArrays. I think that currently, the 'preserve' is meant to mean that the documents themselves will be preserved through this stage in the pipeline. That is, each document will have at least one output, none will be lost. Before, the 'preserve' could also be taken to mean 'preserve the whole document untouched through this stage'. So yes, the name is not as great a fit after the behavior change.

However, I don't see a great reason to change the name this late in the game. True, the name is less applicable than it was before the adjustment of how empty arrays are handled, but most people will not even know of the previous behavior. The name is still descriptive of what happens, just with a different reading of 'preserve', and as long as the documentation is clear, there shouldn't be much confusion. I don't anticipate anyone being upset because they wanted and expected an empty array to be literally preserved. Perhaps a bit confused, but again, good documentation should mitigate this. Additionally, and this is more of an inkling, includeNullsAndEmptyArrays seems to me to imply that nulls and empty arrays within the array were not previously being included. preserveNullsAndEmptyArrays doesn't imply that to me. Not sure why though.

Comment by Justin Lee [ 02/Oct/15 ]

Looks good to me, too.

Comment by Craig Wilson [ 02/Oct/15 ]

Yes, This works for me.

Comment by Charlie Swanson [ 01/Oct/15 ]

Re-opening to track the work of described in previous comments. We have decided to go with a slight modification of the changes described in this comment. That is, with the 'preserveNullAndEmptyArrays' option:

  1. null values, undefined values, and missing values will pass through unchanged.
  2. Empty arrays will not be present in the output. For example (inventing some briefer syntax which is hopefully still clear): unwind('a', {_id: 0, a: []}) becomes {_id: 0}

This will allow all non-array values to be treated similarly (A string will already pass through an unwind stage unchanged), but still solves the problem of having mixed types in the case of an empty array.

So for complete clarity, here are the new intended behaviors:

> db.foo.find()
{_id: 0, a: ['string', 'other string']}
{_id: 1, a: []}
{_id: 2}
{_id: 3, a: null}
{_id: 4, a: undefined}
{_id: 5, a: 'non-array, but still a string'}
{_id: 6, a: 5}
{_id: 7, a: [4, 'string', null]}
...
> db.foo.aggregate([{$unwind: {path: '$a', preserveNullAndEmptyArrays: true}])
{_id: 0, a: 'string'}
{_id: 0, a: 'other string'}
{_id: 1}
{_id: 2}
{_id: 3, a: null}
{_id: 4, a: undefined}
{_id: 5, a: 'non-array, but still a string'}
{_id: 6, a: 5}
{_id: 7, a: 4}
{_id: 7, a: 'string'}
{_id: 7, a: null}
...

cc craiggwilson, asya, justin.lee, this still works for your purposes, right?

Comment by Asya Kamsky [ 01/Oct/15 ]

In this example, I can see _id:3 resulting in either missing "a" or "a" being null. null is a little more consistent with _id 6 result. I think either one is okay.

Comment by Charlie Swanson [ 01/Oct/15 ]

There's been some unintended consequences of the behavior of this option with empty arrays. As it is implemented, it makes it hard to infer the type of the field in the output document. For example, let's say you somehow knew/enforced that the field 'a' in every document was an array of strings (allowed to be empty):

> db.foo.find()
{_id: 0, a: ['string', 'other string']}
{_id: 1, a: []}
...

Logically, if you performed the following aggregation:

> db.foo.aggregate([{$unwind: '$a'}])
{_id: 0, a: 'string'}
{_id: 0, a: 'other string'}
...

You could correctly deduce that every resulting document to have a string value for the field 'a'. However, if you included the new option:

> db.foo.aggregate([{$unwind: {path: '$a', preserveNullAndEmptyArrays: true}])
{_id: 0, a: 'string'}
{_id: 0, a: 'other string'}
{_id: 1, a: []}
...

The resulting documents sometimes have a string for the field 'a', and sometimes have an array. This makes it difficult for a typed query language to handle the result.

The intent of this new option was to ensure every document has at least 1 resulting document after an $unwind. How the nullish values appear in the output document does not appear to be critical for the use case it was intended to address. As such, we should adjust the behavior to make the above scenario have stronger type guarantees. In particular, in an $unwind with preserveNullAndEmptyArrays, all nullish fields (including empty arrays), should not appear in the output document. This would give the following behavior:

> db.foo.find()
{_id: 0, a: ['string', 'other string']}
{_id: 1, a: []}
{_id: 2}
{_id: 3, a: null}
{_id: 4, a: 'non-array, but still a string'}
{_id: 5, a: 5}
{_id: 6, a: [4, 'string', null]}
...
> db.foo.aggregate([{$unwind: {path: '$a', preserveNullAndEmptyArrays: true}])
{_id: 0, a: 'string'}
{_id: 0, a: 'other string'}
{_id: 1}
{_id: 2}
{_id: 3}
{_id: 4, a: 'non-array, but still a string'}
{_id: 5, a: 5}
{_id: 6, a: 4}
{_id: 6, a: 'string'}
{_id: 6, a: null}
...

Documents would still be preserved in the sense that each input has at least one output, but the values of the nullish fields would not be preserved.

Comment by Githook User [ 11/Sep/15 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-20168 Add option to $unwind stage for treatment of nullish values

Adds a preserveNullAndEmptyArrays option to the $unwind stage. If this option
is specified, the $unwind stage will allow documents with nullish values for
the specified unwinding path, or those with an empty array as the value for
that path, to pass through unaltered.

Note this does not affect the behavior of non-nullish, non-array values.
Branch: master
https://github.com/mongodb/mongo/commit/3cb6039e5d2968b1e361bf592452b95c7a76c770

Comment by Githook User [ 11/Sep/15 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-20168 Add option to $unwind stage for treatment of nullish values

Adds logic to $lookUp to handle the preserveNullAndEmptyArrays parameter from
the $unwind stage if the $lookUp stage had coalesced with an $unwind stage.
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/d1e4c10271c3231de10fcc6756a400a130536c88

Generated at Thu Feb 08 03:53:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.