-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
There is currently a circular dependency between optimize() and optimizeAt(). Some optimizeAt() methods rely on optimize() already being called, while other optimizeAt() methods rely on optimizeAt() being complete before any calls to optimize()
This is specifically evident within document_source_internal_unpack_bucket and document_source_sequential_document_cache. The internalUnpackBucket stage calls optimize (example here and here) and relies on it simplifying expressions when considering certain optimizations and rewrites. At the same time, the sequentialDocumentCache decides whether or not it can cache any part of the pipeline in its doOptimizeAt method. In that logic, the cache relies on let variables not being inlined, which occurs in calls to optimize(). Therefore, if we optimize() from the internalUnpackBucket's doOptimizeAt(), this will cause the cache to incorrectly cache results (we added logic to avoid this for now). However, if we don't, certain rewrites in the internalUnpackBucket will not occur.
Here are some specific examples which rely on calls to optimize() within the internalUnpackBucket's doOptimizeAt():
- `$group` rewrites to avoid bucket unpacking rely on `optimize()` being called . One example is for $dateTrunc on the time field, we rely on
ExpressionDateTrunc::optimize() to be called. - The optimization in which we create a loose predicate and push it in front of the bucket unpack stage relies on the optimize() being called on the eventFilter.
We should remove this circular dependency, and assume that optimizeAt() is always called on the entire pipeline before any call to optimize().
- is depended on by
-
SERVER-79692 Duplicated predicates pushed down into collection access for $match over time-series
- Open