[SERVER-84113] Remove circular dependency between optimize() and optimizeAt() in DocumentSources Created: 12/Dec/23  Updated: 22/Jan/24

Status: Needs Scheduling
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Naama Bareket Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-79692 Duplicated predicates pushed down int... Open
Assigned Teams:
Query Optimization
Participants:

 Description   

There is currently a circular dependency between optimize() and optimizeAt(). Some optimizeAt() methods rely on optimize() already being called, while other optimizeAt() methods rely on optimizeAt() being complete before any calls to optimize()

This is specifically evident within document_source_internal_unpack_bucket and document_source_sequential_document_cache. The internalUnpackBucket stage calls optimize (example here and here) and relies on it simplifying expressions when considering certain optimizations and rewrites. At the same time, the sequentialDocumentCache decides whether or not it can cache any part of the pipeline in its doOptimizeAt method. In that logic, the cache relies on let variables not being inlined, which occurs in calls to optimize(). Therefore, if we optimize() from the internalUnpackBucket's doOptimizeAt(), this will cause the cache to incorrectly cache results (we added logic to avoid this for now). However, if we don't, certain rewrites in the internalUnpackBucket will not occur.

Here are some specific examples which rely on calls to optimize() within the internalUnpackBucket's doOptimizeAt():

  • `$group` rewrites to avoid bucket unpacking rely on `optimize()` being called . One example is for $dateTrunc on the time field, we rely on 
    ExpressionDateTrunc::optimize() to be called.
  • The optimization in which we create a loose predicate and push it in front of the bucket unpack stage relies on the optimize() being called on the eventFilter. 
     
    We should remove this circular dependency, and assume that optimizeAt() is always called on the entire pipeline before any call to optimize().

Generated at Thu Feb 08 06:54:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.