[SERVER-58954] Refactor and unify the code to optimize expressions in projections Created: 29/Jul/21  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Andrii Dobroshynski (Inactive) Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-57749 Query marked as SBE compatible, but l... Closed
Gantt Dependency
has to be done after SERVER-57749 Query marked as SBE compatible, but l... Closed
Related
is related to SERVER-60203 Translation of the groupBy expression... Closed
is related to SERVER-60226 Ban sugared $group stages from being ... Closed
Assigned Teams:
Query Execution
Participants:

 Description   

As part of the work in SERVER-57749 we looked into cleaning up / standardizing code to optimize projection expressions, but found that doing so would involve a large overhaul of the existing code and that it is beyond the scope of the original ticket (which was about a late call to optimize() on a projection in SBE). 

To fix the issue of a late optimization as well as avoid the issue with 'find' in current state producing sub-optimal plans for some queries, we instead suggest to do it earlier when building the projection AST. However, because we currently have an optimize() method on a ProjectionNode, this causes there to be two ways in which we can potentially optimize a projection:

  1. Parse the projection
    Build ProjectionExecutor
    Optimize ProjectionExecutor

 
(new method)

  1. Parse projection
    optimize it
    Build ProjectionExecutor

The goal is to clean this up by ideally removing the optimize() method on the ProjectionNode, however, this is difficult since we rely on optimizations in certain cases with $lookup. In this situation we need to make sure that $lookup inner pipelines are not being optimized too early, as that might lead to incorrect positioning of the caching stage if certain variables have been optimized away. See this comment for this description:

 

One proposed approach that could be implemented as suggested by Ian is to include a new "prepare" method to DocumentSource that would mostly do nothing but in the case of DocumentSourceSingleDocumentTransformation can be used for the optimization purposes. This then would also require us to plumb through a call to optimize to the 'TransformerInterface' of this DocumentSource for projections.



 Comments   
Comment by Yoon Soo Kim [ 24/Sep/21 ]

$bucket/$bucketAuto/$sortByCount are sugared $group stages. Just like SERVER-57749SERVER-60203 is related to the late optimization of expressions because $bucket/$bucketAuto/$sortByCount is desurgared first and then optimized.

With the prospect of resolving SERVER-58954, banning sugared $group stage from being pushed down to SBE for $group pushdown project.

Generated at Thu Feb 08 05:45:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.