[SERVER-34741] Move $match in front of $group if condition is on group key Created: 28/Apr/18  Updated: 29/Oct/23  Resolved: 20/Jun/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 4.3.1, 4.2.12

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Backlog - Query Optimization
Resolution: Fixed Votes: 0
Labels: neweng, optimization, performance, read-only-views
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
Assigned Teams:
Query Optimization
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.2
Sprint: Query 2019-06-03, Query 2019-06-17, Query 2019-07-01
Participants:
Case:

 Description   

In case of views created on results of a $group, if the subsequent query is filtering on _id field then it should be possible to push that $match in front of $group transforming it into filter against whatever field is being grouped.

Example:

Vew definition:

[ {$group:{_id:"$city", ...}} ] 

Query on view:

{$match: {_id:{$in:["NY", "SF"]}}}

Pipeline transformation:

// from
[ {$group:{_id:"$city", ...}}, {$match: {_id:{$in:["NY", "SF"]}}} ] 
// to
[{$match: {city:{$in:["NY", "SF"]}}}, {$group:{_id:"$city", ...}} ] 



 Comments   
Comment by Githook User [ 16/Dec/20 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@10gen.com'}

Message: SERVER-34741 Move $match in front of $group if condition is on group key

(cherry picked from commit 55e76198b1e41b5dbd5868d2ed8b914da29f0f29)
Branch: v4.2
https://github.com/mongodb/mongo/commit/0aefd7c76f286b9a51296144df9fdb249fec8948

Comment by Githook User [ 20/Jun/19 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@10gen.com'}

Message: SERVER-34741 Move $match in front of $group if condition is on group key
Branch: master
https://github.com/mongodb/mongo/commit/55e76198b1e41b5dbd5868d2ed8b914da29f0f29

Comment by Asya Kamsky [ 06/Jul/18 ]

I got to thinking that this optimization may not be correct to do in the following scenario:

 pipeline=[ {$group:{_id:"$city", ...}}, {$match: {_id:{$in:["NY", "SF"]}}} ]

If the actual usage of the pipeline is inside expressive $lookup:

db.foo.aggregate([
   {$lookup: {
       from: "cities",
       as: "x",
       let: { city:"_id"},
       pipeline: [
            {$group:{_id:"$city", ...}},
            {$match: {$expr:{$eq: [ "$_id", "$$city"] } } }
       ]
   } }

Here the pipeline was written specifically to allow the first (group) stage to be non-correlated, so it will only execute once for outer aggregation, with just a single $match for each document streaming through.

Optimizing the inner expressive lookup pipeline here would defeat the plan by always first doing match on single value and then doing $group - while overall it's a smaller $group, it may be the wrong approach to reduce overall amount of work.

So we should be careful about where this optimization is applied when we implement this.

Generated at Thu Feb 08 04:37:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.