[SERVER-30474] leading $facet with each facet beginning with $match should add $match before $facet Created: 02/Aug/17  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework, Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Vick Mena (Inactive) Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 9
Labels: optimization
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to DOCS-13879 Document that $facet cannot use indexes Closed
is related to SERVER-34428 $facet aggregation stage does not use... Closed
Assigned Teams:
Query Optimization
Participants:
Case:

 Description   

Imagine you have an aggregation pipeline that looks something like:

db.collection.aggregate([
  {$facet: {
    xs: [
      {$match: {x: "SOMETHING"}},
      {$group: {...}}
    ],
    ys: [
      {$match: {y: "SOMETHING ELSE"}},
      {$unwind: "$y"},
      ...
    ]
  }}
])

Such a pipeline would not provide any predicates for the query system to use to begin the pipeline. This pipeline might benefit from being optimized to something more like:

db.collection.aggregate([
  {$match: {$or: [{x: "SOMETHING"}, {y: "SOMETHING ELSE"}]}},
  {$facet: {
    xs: [
      {$match: {x: "SOMETHING"}},
      {$group: {...}}
    ],
    ys: [
      {$match: {y: "SOMETHING ELSE"}},
      {$unwind: "$y"},
      ...
    ]
  }}
])

Original Description

$facet is not smart enough to use an index. if the only stage in a pipeline is $facet, the planning system will simply default to a COLLSCAN.



 Comments   
Comment by Asya Kamsky [ 01/Sep/20 ]

Ignore my previous comment - was testing a different issue

Comment by DANIELE Tassone [ 06/Jun/20 ]

While I was using $facet to try optimise a complex query scenario, I got this blocking issue.
Would be great to improve this feature.

Comment by Asya Kamsky [ 02/Jan/20 ]

> If I add $match with $or clause having all conditions which are the in facet will it be any beneficial ?

Its benefit will depend on how selective it is. If it's selective relative to total size of the collection then it should benefit the overall performance significantly (assuming there is an index that can be used).

Comment by Rishi Mishra [ 18/Dec/19 ]

Is this feature getting implemented anytime soon ? 

If I add $match with $or clause having all conditions which are the in facet will it be any beneficial ? 

does adding $match at first , will reduce number of document to be scanned by facet or it will go with all collection document irrespective of $match ?

Comment by Asya Kamsky [ 16/Jan/19 ]

inking007 please watch (and upvote) SERVER-23201 for Union equivalent expression.

Comment by Intae Kim [ 15/Jan/19 ]

On Mongodb CE 3.6.9, I  tried using 'facet'  to make query like UNION of RDBMS's SQL. but Facet was very slow even though I wrote query with $match of indexed fields. Is there any other query like  UNION of SQL?

Comment by aniket patil [ 03/Jan/19 ]

Any update on this. How to pushed up this issue from backlog bucket.

Comment by aniket patil [ 21/Dec/18 ]

It looks like this is must needed functionality to take advantage of $facet aggregation operator. If $facet aggregation stage does not use index created for the sub pipelines then the purpose of use $facet is killed. Please take this issue into account.

Comment by Kevin Pulo [ 03/Jun/18 ]

The approach of promoting the $match out of the $facet is fine for exact matches, but not for more complex situations, eg.

db.collection.aggregate([
  {$facet: {
    x1: [
      {$match: {x: {$gte: somevalue}}},
      {$limit: 1}
    ],
    x2: [
      {$match: {x: {$lte: someothervalue}}},
      {$limit: 1}
    ]
  }}
])

When run as independent aggregations, these each scan just 1 document. But when run as above inside $facet, the whole collection gets scanned.

Rewriting this as suggested above would give:

db.collection.aggregate([
  {$match: { $or: [ {x: {$gte: somevalue}}, {x: {$lte: someothervalue}}]}},
  {$facet: {
    x1: [
      {$match: {x: {$gte: somevalue}}},
      {$limit: 1}
    ],
    x2: [
      {$match: {x: {$lte: someothervalue}}},
      {$limit: 1}
    ]
  }}
])

In this case, the leading $match likely gives no benefit, since up to the entire collection will still be passed into $facet.

I think what's really needed is, when $facet is the first stage, each of its sub-pipelines should be run as if it was an actual aggregation (ie. with full agg query planning), like the way that $lookup with a pipeline does. This would allow both this aggregation and the one originally posted on this ticket, to run with full efficiency (without any rewriting).

Generated at Thu Feb 08 04:23:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.