[SERVER-25957] Optimize $filter + $arrayElemAt to avoid scanning entire array Created: 04/Sep/16  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 2.6.4
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Xavier Del Castillo Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 1
Labels: neweng, optimization, qopt-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-32739 allow limit in $filter expression Closed
Assigned Teams:
Query Optimization
Participants:

 Description   

If you have a very large array, and want to extract the first element matching some predicate, you should be able to do so without scanning the entire array.

We should optimize the combination of $arrayElemAt and $filter to do so.

Original Description

Why can't $elemMatch be used in the $project phase? It would allow me to easily select an element out of an array.

Example:

db.person.aggregate([
  { "$project": {'main_event' :
    {'$events_array': {'$elemMatch' : {
      'event_status' : 'main'
    }}}
  }}
]);

Basically what I'm looking for is an easier way to convert an array into a scalar without having to $unwind and then re-$group, which is taxing for proc time.



 Comments   
Comment by Charlie Swanson [ 13/Dec/19 ]

The above commit message mentioned the wrong SERVER ticket so this was mistakenly closed. There is a work in progress pull request here.

Comment by Craig Homa [ 13/Dec/19 ]

Re-opening this as it was closed in error.

Comment by Githook User [ 01/May/18 ]

Author:

{'email': 'KevinCybura@gmail.com', 'name': 'KevinCybura'}

Message: SERVER-25957 Optimize $indexOfArray when array argument is constant.

Signed-off-by: Charlie Swanson <charlie.swanson@mongodb.com>

Closes #1229
Branch: master
https://github.com/mongodb/mongo/commit/ae01d46b2d70ce2491cc01ef131dd101089c6d21

Comment by Asya Kamsky [ 29/Mar/18 ]

Please note that:

For answering the question "Is there an element in this array that matches this?"

This should not be done with $filter in aggregation expressions, but with $in.

Comment by Asya Kamsky [ 28/Jun/17 ]

This may be useful as a general optimization not just for $arrayElemAt but for any sort of $slice of the result array. In other words, it would be useful to be able to short circuit out of $filter early if the full result array will not be used.

{$project:{newarr:{$slice:[
          {$filter:{input:"$bigArray", cond: { ... } } },
          0,
          5
]}}}

The above should be able to stop processing "$bigArray" in $filter as soon as 5 elements have been generated from $filter expression.

Comment by Xavier Del Castillo [ 28/Sep/16 ]

Thanks for considering this request.

Comment by Charlie Swanson [ 09/Sep/16 ]

xdc, This seems like a legitimate request, at least as a performance improvement. As you noticed above, you can use $filter to do most of the work, so if you combine that with an {$arrayElemAt: [{$filter: ...}, 0]} it should do what you want.

I'll convert this ticket into a request for the above expression to be optimized not to scan the entire array, instead stopping at the first matching element.

Comment by Xavier Del Castillo [ 04/Sep/16 ]

This is a clone task of: SERVER-14876

The resolution was that:
> This use case is handled by the new $filter expression.

$filter was implemented on: https://jira.mongodb.org/browse/SERVER-17943

However, I do not think `$filter` and `$elemMatch` are equal - according to the docs `$elemMatch` stops looking when it matches at least one, potentially stopping scanning early - however `$filter` will scan the whole array, even if the first element in the array matches the condition, returning the subset of elements passing the condition. For answering the question "Is there an element in this array that matches this?" `$elemMatch` is the correct choice, `$filter` can answer that question but does unnecessary work.

Thoughts on the above use case?

Generated at Thu Feb 08 04:10:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.