-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Query Execution
-
Labels:None
-
Fully Compatible
-
ALL
-
Query Execution 2021-04-19, Query Execution 2021-05-03
I stumbled upon this somewhat incidentally when investigating a fuzzer failure, which led me to create a repro script (attached as repro.js) that was based off a fuzzer-generated script. The script just inserts some fuzzer generated data, makes a few indexes, and then runs a fuzzer-generate aggregate command. The command involves a $sort which is pushed down to the SBE layer. There are two candidate plans, both requiring a blocking SORT stage. SBE multi-planning occurs, and the following plan wins, represented below as both a QuerySolutionNode tree and an SBE tree:
PROJ ---proj = { _id: true, date: { $const: new Date(-61660742400000) } } ---type = DEFAULT ---nodeId = 4 ---fetched = 1 ---sortedByDiskLoc = 0 ---providedSorts = {baseSortPattern: {}, ignoredFields: []} ---Child: ------SORT ---------type = SIMPLE ---------pattern = { obj.obj.obj.obj.array: 1, _id: 1 } ---------limit = 0 ---------nodeId = 3 ---------fetched = 1 ---------sortedByDiskLoc = 0 ---------providedSorts = {baseSortPattern: {}, ignoredFields: []} ---------Child: ------------FETCH ---------------nodeId = 2 ---------------fetched = 1 ---------------sortedByDiskLoc = 0 ---------------providedSorts = {baseSortPattern: { obj.date: 1 }, ignoredFields: []} ---------------Child: ------------------IXSCAN ---------------------indexName = obj.date_1 ---------------------keyPattern = { obj.date: 1.0 } ---------------------direction = 1 ---------------------bounds = field #0['obj.date']: [MinKey, new Date(1555269293468)], (new Date(9223372036854775807), MaxKey] ---------------------nodeId = 1 ---------------------fetched = 0 ---------------------sortedByDiskLoc = 0 ---------------------providedSorts = {baseSortPattern: { obj.date: 1 }, ignoredFields: []} [4] traverse s34 s33 s9 {} {} from [3] sort [s28, s31] [asc, asc] [s9, s10] [3] project [s31 = fillEmpty (s30, undefined)] [3] traverse s30 s29 s12 {if (s29 <=> s30 < 0, s29, s30)} {} from [3] project [s28 = fillEmpty (s27, null)] [3] traverse s27 s26 s11 {if (s26 <=> s27 < 0, s26, s27)} {} from [3] project [s13 = ! isArray (s11) || let [l1.0 = fillEmpty (getField (s11, "obj"), null)] isArray (l1.0) || let [l2.0 = fillEmpty (getField (l1.0, "obj"), null)] isArray (l2.0) || let [l3.0 = fillEmpty (getField (l2.0, "obj"), null)] isArray (l3.0) || isArray (fillEmpty (getField (l3.0, "array"), null)) || ! isArray (s12) || fail ( 2 ,cannot sort with keys that are parallel arrays)] [3] project [s11 = fillEmpty (getField (s9, "obj"), null), s12 = fillEmpty (getField (s9, "_id"), null)] [2] nlj [] [s2] left [1] filter {isRecordId (s2)} [1] lspool sp1 [s2] {! isRecordId (s2)} [1] union [s2] [ [s3] [1] project [s3 = KS(0A0104)] [1] limit 1 [1] coscan , [s8] [1] nlj [] [s6] left [1] sspool sp1 [s6] right [1] chkbounds s4 s5 s8 [1] nlj [] [s7] left [1] project [s7 = s6] [1] limit 1 [1] coscan right [1] ixseek s7 s4 s5 [] @"174bb7e1-898e-4b10-bb69-e4b63bd456e2" @"obj.date_1" true ] right [2] limit 1 [2] seek s2 s9 s10 [] @"174bb7e1-898e-4b10-bb69-e4b63bd456e2" true in [3] project [s26 = fillEmpty (s25, null)] [3] traverse s25 s24 s14 {if (s24 <=> s25 < 0, s24, s25)} {} from [3] project [s14 = getField (s11, "obj")] [3] limit 1 [3] coscan in [3] project [s24 = fillEmpty (s23, null)] [3] traverse s23 s22 s15 {if (s22 <=> s23 < 0, s22, s23)} {} from [3] project [s15 = getField (s14, "obj")] [3] limit 1 [3] coscan in [3] project [s22 = fillEmpty (s21, null)] [3] traverse s21 s20 s16 {if (s20 <=> s21 < 0, s20, s21)} {} from [3] project [s16 = getField (s15, "obj")] [3] limit 1 [3] coscan in [3] project [s20 = fillEmpty (s19, undefined)] [3] traverse s19 s18 s17 {if (s18 <=> s19 < 0, s18, s19)} {} from [3] project [s17 = fillEmpty (getField (s16, "array"), null)] [3] limit 1 [3] coscan in [3] project [s18 = s17] [3] limit 1 [3] coscan in [3] project [s29 = s12] [3] limit 1 [3] coscan in [4] mkbson s33 s9 [_id] keep [date = s32] true false [4] project [s32 = -61660742400000] [4] limit 1 [4] coscan
The winning plan exited early, but did not yet reach EOF, so the SBE multi-planning code closes and re-opens the plan:
The SBE plan appears to loop infinitely during re-opening. It also looks like the spool sp1, used for the generic index scanning algorithm, is growing unbounded in size. I need to dig more into why this is happening, but my current hypothesis is that closing and re-opening the plan is putting it into some unexpected state, and as a result the stack spool-based index scanning algorithm does not terminate.