[SERVER-43400] Easier $group after $unwind when it is desired to just recombine all fields Created: 20/Sep/19  Updated: 05/Nov/19  Resolved: 05/Nov/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 4.2.0
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Ben Rotz Assignee: Asya Kamsky
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

I always dread having to $group after an $unwind. I find myself commonly performing $unwind so I can do operations on the children like $lookup.

A common simplified scenario might be getting the total inventory count for items in a basket:

db.basket.insertMany([
  {
    person: 'andrea',
    basket: [
      {name: 'apple', qty: 2},
      {name: 'banana', qty: 3},
      {name: 'cucumber', qty: 1},
    ]
  }, {
    person: 'bob',
    basket: [
      {name: 'banana', qty: 20}
    ]
  }, {
    person: 'curtis',
    basket: []
  }
]);
db.item.insertMany([
  {name: 'apple', location: 'California', inventory: 1},
  {name: 'banana', location: 'California', inventory: 2},
  {name: 'cucumber', location: 'California', inventory: 3},
  {name: 'apple', location: 'Colorado', inventory: 11},
  {name: 'banana', location: 'Colorado', inventory: 12},
  {name: 'cucumber', location: 'Colorado', inventory: 13}
]);

 

Now, to get this, the aggregation is:

db.basket.aggregate([
  {
    $unwind: {
      path: '$basket',
      preserveNullAndEmptyArrays: true
    }
  },
  {$lookup: {
    from: 'item',
    let: {
      's_name': '$basket.name'
    },
    as: 'basket._item',
    pipeline: [
      {$match: {$expr: {$eq: ['$name', '$$s_name']}}},
      {$group: {
        _id: null,
        total_inventory: {$sum: '$inventory'}
      }},
      {$project: {
        _id: 0
      }}
    ]
  }},
  {$unwind: {
    path: '$basket._item',
    preserveNullAndEmptyArrays: true
  }},
  {
    $group: {
      _id: {
        '_id': '$_id'
      },
      person: {
        $first: '$person'
      },
      basket: {
        $push: '$basket'
      }
    }
  }
]);

Now, the 1st $unwind and $lookup stages are fine. The 2nd $unwind stage is a pet peeve of mine (I wish there was just a $lookupOne operator, but that's beside the point).

So that leaves us with the $group stage. You can see the pattern for $grouping after an $unwind is just

 

pathOfThingUnwound: {$push: '$pathOfThingUnwound'}

The problem, annoying thing, however, is having to do all of the $first operations. It gets quite cumbersome to keep track of all of the fields that are in the document up to that point. I imagine the stage `$addFields` was created from this same reasoning (using a $project and repeating everything all over again is tedious).

 

Is there a solution?



 Comments   
Comment by Asya Kamsky [ 05/Nov/19 ]

Given existing workaround, I'm closing this ticket.

 

We are working to identify other ways that similar operations may be easier to express in the aggregation language.

 

Comment by Asya Kamsky [ 05/Nov/19 ]

You can always use $arrayElemAt expression to select the first element from array (whether it has one element or not).

Comment by Ben Rotz [ 05/Nov/19 ]

@Asya is there an easy way to have basket just return an object if I am confident in only getting one match? Similar to Array.find() in javascript?

Use case is for when lookup is going to produce at most 1 item per object in the array that I am mapping to. The problem for me is that my use case is a little different, but I have something like

 

db.product.insertMany([
{name: 'shirt', skus: [{size: 'S'}, {size: 'M'}]},
{name: 'pants', skus: [{size: 'M'}, {size: 'L'}]}
]);
db.inventory.insertMany([
{name: 'shirt', size: 'S', quantity: 1},
{name: 'shirt', size: 'M', quantity: 2},
{name: 'pants', size: 'M', quantity: 3},
{name: 'pants', size: 'L', quantity: 4}
]);
 
 db.product.aggregate([
 
  {$lookup: {
    from: 'inventory',
    let: {
      pname: '$name'
    },
    as: 'inventory',
    pipeline: [
      {$match: {
        $and: [
          {$expr: {$eq: ['$name', '$$pname']}}
        ]
      }}
    ]
  }},
 
  {$addFields: {
    skus: {
      $map: {
        input: '$skus',
        as: 'b',
        in: {
          '$mergeObjects': [
            '$$b',
            {inventory: {
              $filter: {
                'input': '$inventory',
                'cond': {'$eq': ['$$b.size', '$$this.size']}
              }
            }} 
          ]
        }
      }
    },
    inventory: '$$REMOVE'
  }}
 
]);

 

The above produces `skus.inventory []` instead of `skus.inventory {}`, and it's difficult to unwind because it's inside of another array.

I filed SERVER-44441 for such an operator

 

 

Comment by Ben Rotz [ 07/Oct/19 ]

All of my use cases for doing a $group after an $unwind are because of a $lookup on an array as you mentioned. Your awesome pipeline above (from your comment left on Sep 26 2019, 02:24:58 GMT+0000) has alleviated most of my problems as it has given me a new way to do what I was previously doing with an $unwind. So, thank you! I'm OK if this task is closed, because I'm not sure how such an operator would be written, and unless someone else is dying for this feature/functionality, I think there are more important things to be worked on, like having $expr able to use multikey indexes

Thank you

 

Comment by Asya Kamsky [ 07/Oct/19 ]

ben@ethika.com can you comment whether the fundamental request here has to do with $lookup on an array (like SERVER-42306)? Or any general $group improvement that has a number of fields that use $first (or $last)?

Comment by Asya Kamsky [ 26/Sep/19 ]

If I might recommend for your specific case, don't unwind the original array and run this instead:

db.basket.aggregate([
  {$lookup: {     from: 'item',    localField:'basket.name' , as: '_item', foreignField:'name'}}, 
  {$addFields:{
      basket:{$map:{
         input:"$basket", 
         as:"b", 
         in: {$mergeObjects:[
            "$$b", 
            {totalInventory: {$sum:{$let:{
               vars:{f:{$filter:{input:"$_item",cond:{$eq:["$$b.name","$$this.name"]}}}},
               in:"$$f.inventory"
            }}}}
         ]}
     }},
     _item:"$$REMOVE"}}]).pretty()
{
	"_id" : ObjectId("5d8cc53ac1e575f013f3f589"),
	"person" : "andrea",
	"basket" : [
		{
			"name" : "apple",
			"qty" : 2,
			"totalInventory" : 12
		},
		{
			"name" : "banana",
			"qty" : 3,
			"totalInventory" : 14
		},
		{
			"name" : "cucumber",
			"qty" : 1,
			"totalInventory" : 16
		}
	]
}
{
	"_id" : ObjectId("5d8cc53ac1e575f013f3f58a"),
	"person" : "bob",
	"basket" : [
		{
			"name" : "banana",
			"qty" : 20,
			"totalInventory" : 14
		}
	]
}
{
	"_id" : ObjectId("5d8cc53ac1e575f013f3f58b"),
	"person" : "curtis",
	"basket" : [ ]
}

Comment by Asya Kamsky [ 26/Sep/19 ]

ben@ethika.com are you perhaps looking for an improvement similar to SERVER-42306 in this use case?

Comment by Carl Champain (Inactive) [ 24/Sep/19 ]

Hi ben@ethika.com,

Thanks for the report.
I'll pass it on to the query team.

Kind regards,
Carl

Comment by Ben Rotz [ 20/Sep/19 ]

Put another way, it would be nice if there was just a "project all other fields using $first" in the $group stage. Having to re-type is not only cumbersome, but potentially problematic to maintain as more fields are added to the source collection.

Generated at Thu Feb 08 05:03:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.