[SERVER-68429] Investigate eliminating repeated path traversals in SBE Created: 29/Jul/22  Updated: 27/Mar/23

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Mihai Andrei Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-69510 Make SBE Stage Builder for match expr... Closed
related to SERVER-69616 Use expressions instead of stages for... Closed
Assigned Teams:
Query Execution
Participants:
Story Points: 15

 Description   

One of the areas in which SBE's performance can be improved is by eliminating repeated path traversals. This is evident in $or queries such as:

"$or" : [
                {
                    "a" : 1,
                    "b" : 2
                },
                {
                    "a" : 2,
                    "b" : 3
                }
            ] 

The SBE plan is as follows: 

[1] filter {s30}
[1] nlj [s4, s5] [s4, s5]
    left
        [1] scan s4 s5 none none none none [] @"7b34d685-f43c-4e59-86ad-bdf5f77c6234" true false
    right
        [1] limit 1
        [1] union [s30] [
            [s28] [1] project [s28 = true]
            [1] filter {s16}
            [1] limit 1
            [1] union [s16] [
                [s14] [1] project [s14 = false]
                [1] filter {! fillEmpty (s9, false)}
                [1] traverse s9 s8 s6 {s9 || s8} {s9} 1
                from
                    [1] project [s6 = getField (s4, "a")]
                    [1] limit 1
                    [1] coscan
                in
                    [1] project [s8 = fillEmpty (s6 == s7, false)]
                    [1] limit 1
                    [1] coscan
                ,
                [s15] [1] project [s15 = fillEmpty (s13, false)]
                [1] traverse s13 s12 s10 {s13 || s12} {s13} 1
                from
                    [1] project [s10 = getField (s4, "b")]
                    [1] limit 1
                    [1] coscan
                in
                    [1] project [s12 = fillEmpty (s10 == s11, false)]
                    [1] limit 1
                    [1] coscan           ] ,
            [s29] [1] project [s29 = s27]
            [1] limit 1
            [1] union [s27] [
                [s25] [1] project [s25 = false]
                [1] filter {! fillEmpty (s20, false)}
                [1] traverse s20 s19 s17 {s20 || s19} {s20} 1
                from
                    [1] project [s17 = getField (s4, "a")]
                    [1] limit 1
                    [1] coscan
                in
                    [1] project [s19 = fillEmpty (s17 == s18, false)]
                    [1] limit 1
                    [1] coscan
                ,
                [s26] [1] project [s26 = fillEmpty (s24, false)]
                [1] traverse s24 s23 s21 {s24 || s23} {s24} 1
                from
                    [1] project [s21 = getField (s4, "b")]
                    [1] limit 1
                    [1] coscan
                in
                    [1] project [s23 = fillEmpty (s21 == s22, false)]
                    [1] limit 1
                    [1] coscan           ]
       ] 

In the above plan, we perform multiple getField lookups for "a" and "b", one for each branch of the $or. This ticket tracks to investigate eliminating such repeated path lookups when constructing SBE plans (credit to nikita.lapkov@mongodb.com for the idea!)



 Comments   
Comment by David Storch [ 27/Mar/23 ]

We have a few benchmarks for scenarios where there are repeated path traversals that could theoretically be optimized by this ticket:

  • AggregationExpressionRepeatedPathTraversal.find
  • AggregationExpressionRepeatedPathTraversalInequality.find
  • AggregationExpressionRepeatedPathTraversalWidePredicate.find

Even without the optimization, SBE already shows a substantial speedup compared to the Classic Engine for these benchmarks. The data shows reductions in latency of 20% or more for SBE vs. Classic.

Comment by Kyle Suarez [ 26/Sep/22 ]

Thanks nikita.lapkov@mongodb.com, I also linked SERVER-69616 as related so we remember to see how that affects performance here.

Comment by Nikita Lapkov (Inactive) [ 23/Sep/22 ]

kyle.suarez@mongodb.com Agreed, this could definitely be an area where ABT might help. Also I noticed that the plan is using a lot of stages, so we should reevaluate the performance impact after SERVER-69616 (use expressions instead of stages for logical operators) is merged.

I linked this ticket to SERVER-69510 to serve as one of the motivational examples.

Comment by Kyle Suarez [ 21/Sep/22 ]

mihai.andrei@mongodb.com, could you make sure we have a benchmark that targets this case? If it turns out there's a regression we need to fix this might need to move back from M4 to M2.

Comment by Kyle Suarez [ 21/Sep/22 ]

nikita.lapkov@mongodb.com's work in SERVER-69638 might be our better overall hope for reclaiming perf here, rather than specifically targeting this case in the stage builders. CC andrew.paroski@mongodb.com, david.storch@mongodb.com, mihai.andrei@mongodb.com

Generated at Thu Feb 08 06:10:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.