Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-56132

[SBE] Apparent infinite loop in SBE execution with spool growing infinitely large

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.0.0-rc0
    • Affects Version/s: None
    • Component/s: Query Execution
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Query Execution 2021-04-19, Query Execution 2021-05-03

      I stumbled upon this somewhat incidentally when investigating a fuzzer failure, which led me to create a repro script (attached as repro.js) that was based off a fuzzer-generated script. The script just inserts some fuzzer generated data, makes a few indexes, and then runs a fuzzer-generate aggregate command. The command involves a $sort which is pushed down to the SBE layer. There are two candidate plans, both requiring a blocking SORT stage. SBE multi-planning occurs, and the following plan wins, represented below as both a QuerySolutionNode tree and an SBE tree:

      PROJ
      ---proj = { _id: true, date: { $const: new Date(-61660742400000) } }
      ---type = DEFAULT
      ---nodeId = 4
      ---fetched = 1
      ---sortedByDiskLoc = 0
      ---providedSorts = {baseSortPattern: {}, ignoredFields: []}
      ---Child:
      ------SORT
      ---------type = SIMPLE
      ---------pattern = { obj.obj.obj.obj.array: 1, _id: 1 }
      ---------limit = 0
      ---------nodeId = 3
      ---------fetched = 1
      ---------sortedByDiskLoc = 0
      ---------providedSorts = {baseSortPattern: {}, ignoredFields: []}
      ---------Child:
      ------------FETCH
      ---------------nodeId = 2
      ---------------fetched = 1
      ---------------sortedByDiskLoc = 0
      ---------------providedSorts = {baseSortPattern: { obj.date: 1 }, ignoredFields: []}
      ---------------Child:
      ------------------IXSCAN
      ---------------------indexName = obj.date_1
      ---------------------keyPattern = { obj.date: 1.0 }
      ---------------------direction = 1
      ---------------------bounds = field #0['obj.date']: [MinKey, new Date(1555269293468)], (new Date(9223372036854775807), MaxKey]
      ---------------------nodeId = 1
      ---------------------fetched = 0
      ---------------------sortedByDiskLoc = 0
      ---------------------providedSorts = {baseSortPattern: { obj.date: 1 }, ignoredFields: []}
      
      [4] traverse s34 s33 s9 {} {}
      from
          [3] sort [s28, s31] [asc, asc] [s9, s10]
          [3] project [s31 = fillEmpty (s30, undefined)]
          [3] traverse s30 s29 s12 {if (s29 <=> s30 < 0, s29, s30)} {}
          from
              [3] project [s28 = fillEmpty (s27, null)]
              [3] traverse s27 s26 s11 {if (s26 <=> s27 < 0, s26, s27)} {}
              from
                  [3] project [s13 = ! isArray (s11) || let [l1.0 = fillEmpty (getField (s11, "obj"), null)] isArray (l1.0) || let [l2.0 = fillEmpty (getField (l1.0, "obj"), null)] isArray (l2.0) || let [l3.0 = fillEmpty (getField (l2.0, "obj"), null)] isArray (l3.0) || isArray (fillEmpty (getField (l3.0, "array"), null)) || ! isArray (s12) || fail ( 2 ,cannot sort with keys that are parallel arrays)]
                  [3] project [s11 = fillEmpty (getField (s9, "obj"), null), s12 = fillEmpty (getField (s9, "_id"), null)]
                  [2] nlj [] [s2]
                      left
                          [1] filter {isRecordId (s2)}
                          [1] lspool sp1 [s2] {! isRecordId (s2)}
                          [1] union [s2] [
                              [s3] [1] project [s3 = KS(0A0104)]
                              [1] limit 1
                              [1] coscan ,
                              [s8] [1] nlj [] [s6]
                                  left
                                      [1] sspool sp1 [s6]
                                  right
                                      [1] chkbounds s4 s5 s8
                                      [1] nlj [] [s7]
                                          left
                                              [1] project [s7 = s6]
                                              [1] limit 1
                                              [1] coscan
                                          right
                                              [1] ixseek s7 s4 s5 [] @"174bb7e1-898e-4b10-bb69-e4b63bd456e2" @"obj.date_1" true
      
      
      
      
                         ]
                      right
                          [2] limit 1
                          [2] seek s2 s9 s10 [] @"174bb7e1-898e-4b10-bb69-e4b63bd456e2" true
      
      
              in
                  [3] project [s26 = fillEmpty (s25, null)]
                  [3] traverse s25 s24 s14 {if (s24 <=> s25 < 0, s24, s25)} {}
                  from
                      [3] project [s14 = getField (s11, "obj")]
                      [3] limit 1
                      [3] coscan
                  in
                      [3] project [s24 = fillEmpty (s23, null)]
                      [3] traverse s23 s22 s15 {if (s22 <=> s23 < 0, s22, s23)} {}
                      from
                          [3] project [s15 = getField (s14, "obj")]
                          [3] limit 1
                          [3] coscan
                      in
                          [3] project [s22 = fillEmpty (s21, null)]
                          [3] traverse s21 s20 s16 {if (s20 <=> s21 < 0, s20, s21)} {}
                          from
                              [3] project [s16 = getField (s15, "obj")]
                              [3] limit 1
                              [3] coscan
                          in
                              [3] project [s20 = fillEmpty (s19, undefined)]
                              [3] traverse s19 s18 s17 {if (s18 <=> s19 < 0, s18, s19)} {}
                              from
                                  [3] project [s17 = fillEmpty (getField (s16, "array"), null)]
                                  [3] limit 1
                                  [3] coscan
                              in
                                  [3] project [s18 = s17]
                                  [3] limit 1
                                  [3] coscan
      
      
      
      
      
          in
              [3] project [s29 = s12]
              [3] limit 1
              [3] coscan
      
      in
          [4] mkbson s33 s9 [_id] keep [date = s32] true false
          [4] project [s32 = -61660742400000]
          [4] limit 1
          [4] coscan
      

      The winning plan exited early, but did not yet reach EOF, so the SBE multi-planning code closes and re-opens the plan:

      https://github.com/mongodb/mongo/blob/a7a795246a7ba15b36f96337c97333d1cf7f8061/src/mongo/db/query/sbe_multi_planner.cpp#L97-L106

      The SBE plan appears to loop infinitely during re-opening. It also looks like the spool sp1, used for the generic index scanning algorithm, is growing unbounded in size. I need to dig more into why this is happening, but my current hypothesis is that closing and re-opening the plan is putting it into some unexpected state, and as a result the stack spool-based index scanning algorithm does not terminate.

            Assignee:
            david.storch@mongodb.com David Storch
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: