[SERVER-31082] when $count is at the end of multiple stages that don't change the number of documents in pipeline, those stages can be eliminated Created: 13/Sep/17  Updated: 19/Jan/23

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-73027 [CQF] Test that we eliminate $addFiel... Closed
is related to SERVER-13703 Presence of extraneous $project cause... Backlog
Assigned Teams:
Query Optimization
Participants:

 Description   

Something that looks like this:

{$match:{}},
{$project:{}},
{$lookup:{}},
{$addFields:{}},
{$lookup:{}},
{$count:"c"}

is basically a $match followed by a $count. It would be nice if aggregation had a way of figuring that out.



 Comments   
Comment by David Percy [ 12/Nov/21 ]

Another stage that doesn't change the count is $sort. We could recognize whether or not a $group uses order-sensitive accumulators ($first, $push). Since we consider $sum / $add to be commutative and associative, we could eliminate the $sort in a pipeline like [{$sort ...} {$group: {_id: null, n: {$sum:1}}}].

Comment by Matt Boros [ 12/Nov/21 ]

I think this could be done by changing $count to be a concrete stage instead of an alias for $group $project. When 
stages like $addFields and $project run their doOptimize function, they will see a $count ahead and remove themselves from the pipeline. Then inside the $count doOptimize, we remove the $count from the pipeline and add $group $project. So $count will always result in a $group $project, but is a concrete type (although it will never show up in a pipeline once optimization is done).
 
I think this would give the same effect as what we have now, except that stages can optimize if they see a $count.

Also it looks like $graphLookup is another stage that does not change the document count.

Comment by Asya Kamsky [ 13/Sep/17 ]

I'm guessing that would be part of dependency analysis because this optimization would require figuring out that $count (or any $group which doesn't use any incoming fields) makes all such stages no-ops.

Generated at Thu Feb 08 04:25:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.