[SERVER-14876] support $elemMatch in aggregation $project operator Created: 13/Aug/14  Updated: 25/Jun/15  Resolved: 20/May/15

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 2.6.4
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Ben Rotz Assignee: Unassigned
Resolution: Duplicate Votes: 6
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-17943 add $filter expression to $project to... Closed
Participants:

 Description   

Why can't $elemMatch be used in the $project phase? It would allow me to easily select an element out of an array.

Example:

db.person.aggregate([
  { "$project": {'main_event' :
    {'$events_array': {'$elemMatch' : {
      'event_status' : 'main'
    }}}
  }}
]);

Basically what I'm looking for is an easier way to convert an array into a scalar without having to $unwind and then re-$group, which is taxing for proc time.



 Comments   
Comment by Charlie Swanson [ 22/May/15 ]

Hi Laurent,

Not yet, but it is very similar to the $map operator, except instead of $in to specify the transformation, it uses $cond to specify whether to include that element in the result array.

Charlie

Comment by Laurent Dollé [ 22/May/15 ]

Hi Charlie,
Thanks for the heads-up.
Is there any public documentation already available on this new operator?

– Laurent

Comment by Charlie Swanson [ 20/May/15 ]

This use case is handled by the new $filter expression.

Comment by Asya Kamsky [ 08/Apr/15 ]

There is a workaround described for this in a related ticket:

https://jira.mongodb.org/browse/SERVER-6612?focusedCommentId=806641&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-806641

It involves using $project with $setDifference operator and avoids unwinding and re-grouping.

Comment by Oleg Rekutin [ 07/Jan/15 ]

I have a patch against 2.4 to implement this functionality as a new projection operator using the following syntax:

$elemMatch: {
    "$sourceArrayFieldPath": matchExpression
}

This will need rework to apply to 2.6+ (will actually be cleaner, I think), but meanwhile the 2.4 patch for this feature is: https://github.com/astral303/mongo/commit/b1de11d7af740676db4c9eeb85077ba6a4c2d103

For example, using Ben's originally reported data, here's a working aggregation and its result:

res = c.aggregate([
  {
    '$project': {
      'vice_president': {
        '$elemMatch': {
          '$people': {
            'a': 'vice president'
          }
        }
      }
    }
    },
  {
    '$group': {
      '_id': '$vice_president.name',
      'records': { $sum: 1 }
    }
    },
  {
    '$sort': { _id: 1 }
    }
]);
 
assert.eq(res.result, [{
  _id: 'fred',
  records: 1
}, {
  _id: 'joe',
  records: 3
}]);

This syntax works with arrays of primitives too. You can see all the test cases here:
https://github.com/astral303/mongo/blob/r2.4.12-elemMatchInProject/jstests/aggregation/bugs/elemMatchInProject.js

I just wanted to get feedback on this syntax. I can then implement this against 2.6+.

In my implementation, you cannot reference fields using the "$reference" syntax inside of the matcher expression itself. But I can solve very many problems without that ability, so I think it's still very worthwhile to have this.

Comment by Ben Rotz [ 03/Oct/14 ]

For what it's worth, I tried to see if the $elemMatch operator was available using the "$let" command in the $project phase... and it is not. So no solution there, either. I'm running into an issue where even using the "current method" in the last code example above isn't working for my scenario, because the items inside the "people" array are being filtered out due to some match conditions after an unwind, but I would still like some of the information the array items that were filtered out... unfortunately not possible looks like, since I can't $project anything from an array into anything else. Please consider supporting $elemMatch in the project phase, it would have saved me tons of headache! Thanks!

I have made a google posting about this here: https://groups.google.com/forum/#!topic/mongodb-user/4OrrFi98RmI

Comment by Ben Rotz [ 02/Oct/14 ]

Just commenting to let you know there is still a need for me for this feature, I would really love it.

A couple of clarifying points:

1) Right now, you can't use $elemMatch in the $project phase in the aggregation framework. Still a problem.

2) The $project phase for a find() does not have the ability to "rename" a field on its way out. That could be a useful addition as well, but I'm more concerned about the aggregation framework side of things. Assuming that $elemMatch could be used in the aggregation framework, It would be very useful if $elemMatch could be used underneath a field. For example, if I have a collection like

> db.test.drop();
> db.test.insert({'people' : [{'a' : 'president', 'name' : 'bob'}, {'a' : 'vice president', 'name' : 'fred'}, {'a' : 'secretary', 'name' : 'john'}]});
> db.test.insert({'people' : [{'a' : 'president', 'name' : 'frank'}, {'a' : 'vice president', 'name' : 'joe'}, {'a' : 'secretary', 'name' : 'edna'}]});
> db.test.insert({'people' : [{'a' : 'president', 'name' : 'frank'}, {'a' : 'vice president', 'name' : 'joe'}, {'a' : 'secretary', 'name' : 'edna'}]});
> db.test.insert({'people' : [{'a' : 'president', 'name' : 'frank'}, {'a' : 'vice president', 'name' : 'joe'}, {'a' : 'secretary', 'name' : 'edna'}]});
> db.test.findOne();
{
        "_id" : ObjectId("542cd1498211bad4264fdb15"),
        "people" : [
                {
                        "a" : "president",
                        "name" : "bob"
                },
                {
                        "a" : "vice president",
                        "name" : "fred"
                },
                {
                        "a" : "secretary",
                        "name" : "john"
                }
        ]
}

It would be nice to run

### the below is an example, it obviously won't work in the shell as $elemMatch doesn't work in aggregate()
db.test.aggregate([{'$project' :
    {'super_president' : {'$people' : {'$elemMatch' : {'a' : 'president'}}}},
    {'great_vice_president' : {'$people' : {'$elemMatch' : {'a' : 'vice president'}}}},
    {'subpar_secretary' : {'$people' : {'$elemMatch' : {'a' : 'secretary'}}}}
}]).pretty();
"_id" : ObjectId("542cd1498211bad4264fdb15"),
"super_president" : [{"a" : "president", "name" : "bob"}],
"great_vice_president" : [{"a" : "vice president", "name" : "fred"}],
"subpar_secretary" : [{"a" : "secretary", "name" : "john"}]

3) Right now, using $elemMatch in the find() command returns an array of the FIRST matched item. And it returns it in an array. This seems like a unhelpful combination. If $elemMatch will only EVER return one result, the result may as well be in scalar form. I see that there is already a ticket for matching MULTIPLE elements in the array at SERVER-6612 suggesting the syntax $elemMatchAll. Perhaps a feature where the FIRST element is returned as a scalar (vs. a single item array) could be named $elemMatchScalar or something. This becomes extremely useful in the aggregation framework as it allows you to project an item from the array, and then EASILY run it through a $group stage. Currently, you have to $unwind, then $group once to get rid of the bad items, then finally proceed with the original intended $group. In other words, $project => $group seems much more efficient than $unwind => $group => $group.

Using the example collection, a query to group by the name of the president and the number of records becomes trivial.

## aggregate query current method, much less efficient when considering millions of records
db.test.aggregate([
{'$unwind' : '$people'},
{'$group' : {'_id' : '$_id', 'vice_president_name' : {'$max' : {'$cond' : [{'$eq' : ['$people.a', 'vice president']},'$people.name',null]}}}},
{'$group' : {'_id' : '$vice_president_name', 'records' : {$sum : 1}}}
]).pretty();
 
## possible aggregate query if above features were implemented
db.test.aggregate([
{'$project' : {'vice_president' : {'people' : {'$elemMatchScalar' : {'a' : 'vice president'}}}}},
{'$group' : {'_id' : '$vice_president.name', 'records' : {$sum : 1}}}
]).pretty();

Generated at Thu Feb 08 03:36:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.