[SERVER-49306] Optimization for mid-pipeline $project stages Created: 02/Jul/20  Updated: 07/Jul/20  Resolved: 07/Jul/20

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Harshad Dhavale Assignee: Asya Kamsky
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-13703 Presence of extraneous $project cause... Backlog
Participants:

 Description   

In general, aggregation pipelines are automatically optimized by the server to include only the data which is to be used in the pipeline and/or output at the end. Adding a $project in earlier stages doesn't really limit the amount of data being used in the stages that follow, because the pipeline dependency analysis automatically figures out which fields are needed by the pipeline. Adding $project stages in mid-pipeline can therefore be redundant, and it can prevent pipeline dependency analysis from figuring out which fields are needed by the pipeline (which it does automatically).

The $project is typically intended only to rename fields or reshape data to be output, and therefore, in most cases, $project should only be placed at the end of an aggregation pipeline and can be avoided in many cases.

This is an enhancement request for optimizing mid-pipeline $project stages, and possibly convert them to $addFields, so that they don't interfere with the pipeline dependency analysis.



 Comments   
Comment by Asya Kamsky [ 07/Jul/20 ]

This looks to be a duplicate of existing ticket: SERVER-13703 so marking it as such.

Generated at Thu Feb 08 05:19:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.