[SERVER-79435] [CQF] Improve common subexpression elimination to remove duplicate projection of nested field Created: 27/Jul/23  Updated: 31/Aug/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Ben Shteinfeld Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: M9
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:

 Description   

For the query, 

find({"a.b.c": 1, "a.b.d": 1}) 

where there are non-multikey indexes on "a'" and "a.b", Bonsai might produce the following plan

Root [{p0}]
Filter []
|   EvalFilter []
|   |   Variable [p10]
|   PathGet [b] PathGet [d] PathTraverse [1] PathCompare [Eq] Const [1]
Filter []
|   EvalFilter []
|   |   Variable [p10]
|   PathGet [b] PathGet [c] PathTraverse [1] PathCompare [Eq] Const [1]
PhysicalScan [{'<root>': p0, 'a': p10}, c_8c978cf6-00f0-457e-baa4-a76eba34bbb6]
 
 
[2] filter {(traverseF(getField(getField(s3, "b"), "d"), lambda(l101.0) { (move(l101.0) == 1L) }, false) ?: false)}
[1] filter {(traverseF(getField(getField(s3, "b"), "c"), lambda(l101.0) { (move(l101.0) == 1L) }, false) ?: false)}
[0] scan s2 none none none none none none none lowPriority [s3 = a] @"8c978cf6-00f0-457e-baa4-a76eba34bbb6" true false

Note that the projection for "a" is shared between the two filter nodes as it is pushed down to the physical scan, but the projection for "a.b" is re-evaluated, once by each Filter node.

This plan could be improved by avoiding the duplicate getField("b") by inserting an EvaluationNode between the PhysicalScan and Filter to generate this projection. I think we should run such a rewrite after the memo phases.

Part of this ticket should also be to understand why the existing path fusion rewrites don't handle this case.



 Comments   
Comment by David Storch [ 28/Jul/23 ]

I also wonder if we would get better performance by generating a plan that implements the conjunction as filter(X and Y) rather than filter(X) -> filter(Y). I suppose this might happen naturally anyway if you try to avoid multiple calls to getField("b") in the given example. I'm imaging that you try to generate a let expression which is like let field = getField("b") in traverseF(...) and traverseF(...).

Comment by Anton Korshunov [ 28/Jul/23 ]

Sending this ticket to PM-3223 as this may show up in some of our benchmarks. If not, we will make sure to add the missing benchmarks and look how it can be addressed.

Generated at Thu Feb 08 06:40:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.