[SERVER-70780] [CQF] Simplify redundant PathObj / impossible PathArr Created: 21/Oct/22  Updated: 12/Jun/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Percy Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: Ideas, bonsai-onboarding, optimization
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-77499 [CQF] Add predicates to SargableNode ... Open
Assigned Teams:
Query Optimization
Participants:

 Description   

Take this $elemMatch predicate:

{a: {$elemMatch: {b: 2, c: 3}}}

It expands to this ABT:

|   PathGet [a]
|   PathTraverse [1]
|   PathComposeM []
|   |   PathComposeA []
|   |   |   PathArr []
|   |   PathObj []
|   PathComposeM []
|   |   PathGet [c] PathTraverse [1] PathCompare [Eq] Const [3]
|   PathGet [b] PathTraverse [1] PathCompare [Eq] Const [2]

The last 3 lines correspond to b: 2, c: 3. This predicate can only be true when applied to an object. So the extra PathObj / PathArr checks are redundant: we can replace them with true / false respectively.

It's not enough to check for a PathGet however. A predicate like b: {$exists: false} can be true when applied to a non-object, because it uses PathDefault which can "catch" Nothing. Other nodes like this include PathConstant and PathLambda.



 Comments   
Comment by David Percy [ 01/Jun/23 ]

Initially I thought we should solve this by adding an optimization that takes an ABT and simplifies it. But it looks kind of complicated to detect whether the PathObj is redundant:

  • We have to notice that PathGet [c] ... and PathObj both take the same input, even though they aren't direct siblings.
  • We have to check that PathGet [c] ... is only true for objects.
    • We have to check that its PathTraverse ... child is never true for Nothing.
      • We have to check that its PathCompare ... child is never true for Nothing

This starts to resemble type checking, which we probably should think about more holistically.


So instead, maybe we can look at how we translate $elemMatch to ABT, and try to avoid creating the redundant nodes in the first place.

Maybe "object elemMatch" never needs the PathArr? Maybe that PathArr is only used for nested elemMatch?

Generated at Thu Feb 08 06:17:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.