[SERVER-85369] Reuse projection for same field during top-level field pushdown Created: 18/Jan/24  Updated: 31/Jan/24

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Alya Berciu Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:

 Description   

After SERVER-83441, we use BinaryOp<And> with a pushed-down projection as input for every top-level field, instead of using PathComposeM + PathGets to retrieve each field from a single root projection. There is one case where this can generate a more costly physical plan. Consider the following example query:

{$and: [ { a: 1 }, { a: 2 }, { a: { $gt: 1 }} ]}

Note that unfortunately we don't do interval simplification for this case yet. The plan we generate now post-pushdown is:

Filter []                                                    
|   BinaryOp [And]                                           
|   |   EvalFilter []                                        
|   |   |   Variable [p2]                                    
|   |   PathTraverse [1] PathCompare [Eq] Const [1]
|   BinaryOp [And]                                           
|   |   EvalFilter []                                        
|   |   |   Variable [p2]                                    
|   |   PathTraverse [1] PathCompare [Eq] Const [1]
|   EvalFilter []                                            
|   |   Variable [p2]                                        
|   PathTraverse [1] PathComposeM []
|   |   PathCompare [Lt] Const [""]
|   PathCompare [Gt] Const [1]
PhysicalScan [{'<root>': p0, 'a': p2}, a_e4934641-532e-4772-9816-7cc580c80b28]  

Its good that we can reuse the projection for a- however, we traverse the output of p2 three times (n times for n predicates on the same field). We should avoid the extraneous traverses in this case by generating a plan closer to:

Filter []                                           
|   EvalFilter []                                        
|   |   Variable [p2]
|.  PathTraverse [1] 
|.  PathComposeM                                    
|   |.  PathCompare [Eq] Const [1]
|   PathComposeM
|   |   PathCompare [Eq] Const [1]
|   PathComposeM []
|   |   PathCompare [Lt] Const [""]
|   PathCompare [Gt] Const [1]
...

We would see a performance gain in particular for large collections where field "a" includes large arrays/ the traversal step is expensive.


Generated at Thu Feb 08 06:57:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.