[SERVER-27744] merge sequential $project stages and $addFields stages when appropriate Created: 18/Jan/17  Updated: 30/Jan/24

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Adinoyi Omuya Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: neweng, optimization, patrick
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-29010 Coalesce adjacent exclusion $project ... Closed
Related
related to SERVER-54078 [SBE] Improve perf of the bestbuy_agg... Closed
Assigned Teams:
Query Optimization
Participants:

 Description   

Consider

{"$project": {"__joinedPipeline_1_c":"$c","__joinedPipeline_1_e":"$e","_id":"$__joined_bar._id","c":"$__joined_bar.c"}},
{"$project": {"__joinedPipeline_1_c":1,"__joinedPipeline_1_e":1,"_id":1,"bar_DOT__id":"$_id","bar_DOT_c":"$c","c":1}},
{"$project":{"bar_DOT_c":"$c","foo_DOT_e":"$__joinedPipeline_1_e"}}


The pipeline can be consolidated into a single $project stage:

{"$project":{"bar_DOT_c":"$__joined_bar.c","foo_DOT_e":"$e"}}



 Comments   
Comment by Asya Kamsky [ 06/Oct/18 ]

Here are detailed numbers, on same collection, same plan, doing $count following single $project with 7 fields removed vs 7 $projects each with single field removed (after several warm-up runs):

Single $project: 57ms to 66ms over four runs.
Seven $projects: 112ms to 119ms over four runs.

Comment by Asya Kamsky [ 01/Oct/18 ]

I just did some quick benchmarking and on an "extreme" case - only N sequential project or sequential addFields stages, compared with a single same stage, when you have about seven stages, the execution time on the server roughly doubles. So this is worthwhile an effort.

Comment by Asya Kamsky [ 07/Apr/18 ]

Yes the example given is coalesceable, however, the three project example in the description gives result with _id field equal to original __joined_bar._id and in the second single stage it's the original field. I think the single combined stage has to be corrected to:

{"$project":{_id:"$__joined_bar._id","bar_DOT_c":"$__joined_bar.c","foo_DOT_e":"$e"}}

As far as inclusion and exclusion combining, while this example you gave with all inclusion project is combinable, I was pointing out that mixing $projects with inclusions and $projects with exclusions is not really possible.

The tricky part will be "when appropriate" qualification.

Comment by Adinoyi Omuya [ 08/Dec/17 ]

Do you mean that since the expected coalesced $project excludes _id (an exclusion), the server won't be able to coalesce the 3 stages? If so, why? Why would multiple {{$project}}s that contain inclusion/exclusion directives preclude coalescing?

Comment by Adinoyi Omuya [ 20/Sep/17 ]

I'm not following Asya, I think the example I included in the description should be coalesce-able. Do you agree? If not, why won't it be possible?

Comment by Asya Kamsky [ 16/May/17 ]

I believe only all inclusion or all exclusion stages can be coalesced (i.e. we can't coalesce an exclusion project with an $addFields or inclusion $project).

Comment by Asya Kamsky [ 28/Apr/17 ]

And answering my own question, after testing for SERVER-29010 coalescing these will make the pipeline run faster.

In addition, coalescing adjacent $addFields stages will also improve performance - this ticket can track both.

Comment by Asya Kamsky [ 26/Jan/17 ]

it'll make the pipeline shorter - would this have any measurable performance improvement?

Generated at Thu Feb 08 04:16:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.