[SERVER-56132] [SBE] Apparent infinite loop in SBE execution with spool growing infinitely large Created: 15/Apr/21  Updated: 29/Oct/23  Resolved: 22/Apr/21

Status: Closed
Project: Core Server
Component/s: Query Execution
Affects Version/s: None
Fix Version/s: 5.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File repro.js    
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Query Execution 2021-04-19, Query Execution 2021-05-03
Participants:

 Description   

I stumbled upon this somewhat incidentally when investigating a fuzzer failure, which led me to create a repro script (attached as repro.js) that was based off a fuzzer-generated script. The script just inserts some fuzzer generated data, makes a few indexes, and then runs a fuzzer-generate aggregate command. The command involves a $sort which is pushed down to the SBE layer. There are two candidate plans, both requiring a blocking SORT stage. SBE multi-planning occurs, and the following plan wins, represented below as both a QuerySolutionNode tree and an SBE tree:

PROJ
---proj = { _id: true, date: { $const: new Date(-61660742400000) } }
---type = DEFAULT
---nodeId = 4
---fetched = 1
---sortedByDiskLoc = 0
---providedSorts = {baseSortPattern: {}, ignoredFields: []}
---Child:
------SORT
---------type = SIMPLE
---------pattern = { obj.obj.obj.obj.array: 1, _id: 1 }
---------limit = 0
---------nodeId = 3
---------fetched = 1
---------sortedByDiskLoc = 0
---------providedSorts = {baseSortPattern: {}, ignoredFields: []}
---------Child:
------------FETCH
---------------nodeId = 2
---------------fetched = 1
---------------sortedByDiskLoc = 0
---------------providedSorts = {baseSortPattern: { obj.date: 1 }, ignoredFields: []}
---------------Child:
------------------IXSCAN
---------------------indexName = obj.date_1
---------------------keyPattern = { obj.date: 1.0 }
---------------------direction = 1
---------------------bounds = field #0['obj.date']: [MinKey, new Date(1555269293468)], (new Date(9223372036854775807), MaxKey]
---------------------nodeId = 1
---------------------fetched = 0
---------------------sortedByDiskLoc = 0
---------------------providedSorts = {baseSortPattern: { obj.date: 1 }, ignoredFields: []}
 
[4] traverse s34 s33 s9 {} {}
from
    [3] sort [s28, s31] [asc, asc] [s9, s10]
    [3] project [s31 = fillEmpty (s30, undefined)]
    [3] traverse s30 s29 s12 {if (s29 <=> s30 < 0, s29, s30)} {}
    from
        [3] project [s28 = fillEmpty (s27, null)]
        [3] traverse s27 s26 s11 {if (s26 <=> s27 < 0, s26, s27)} {}
        from
            [3] project [s13 = ! isArray (s11) || let [l1.0 = fillEmpty (getField (s11, "obj"), null)] isArray (l1.0) || let [l2.0 = fillEmpty (getField (l1.0, "obj"), null)] isArray (l2.0) || let [l3.0 = fillEmpty (getField (l2.0, "obj"), null)] isArray (l3.0) || isArray (fillEmpty (getField (l3.0, "array"), null)) || ! isArray (s12) || fail ( 2 ,cannot sort with keys that are parallel arrays)]
            [3] project [s11 = fillEmpty (getField (s9, "obj"), null), s12 = fillEmpty (getField (s9, "_id"), null)]
            [2] nlj [] [s2]
                left
                    [1] filter {isRecordId (s2)}
                    [1] lspool sp1 [s2] {! isRecordId (s2)}
                    [1] union [s2] [
                        [s3] [1] project [s3 = KS(0A0104)]
                        [1] limit 1
                        [1] coscan ,
                        [s8] [1] nlj [] [s6]
                            left
                                [1] sspool sp1 [s6]
                            right
                                [1] chkbounds s4 s5 s8
                                [1] nlj [] [s7]
                                    left
                                        [1] project [s7 = s6]
                                        [1] limit 1
                                        [1] coscan
                                    right
                                        [1] ixseek s7 s4 s5 [] @"174bb7e1-898e-4b10-bb69-e4b63bd456e2" @"obj.date_1" true
 
 
 
 
                   ]
                right
                    [2] limit 1
                    [2] seek s2 s9 s10 [] @"174bb7e1-898e-4b10-bb69-e4b63bd456e2" true
 
 
        in
            [3] project [s26 = fillEmpty (s25, null)]
            [3] traverse s25 s24 s14 {if (s24 <=> s25 < 0, s24, s25)} {}
            from
                [3] project [s14 = getField (s11, "obj")]
                [3] limit 1
                [3] coscan
            in
                [3] project [s24 = fillEmpty (s23, null)]
                [3] traverse s23 s22 s15 {if (s22 <=> s23 < 0, s22, s23)} {}
                from
                    [3] project [s15 = getField (s14, "obj")]
                    [3] limit 1
                    [3] coscan
                in
                    [3] project [s22 = fillEmpty (s21, null)]
                    [3] traverse s21 s20 s16 {if (s20 <=> s21 < 0, s20, s21)} {}
                    from
                        [3] project [s16 = getField (s15, "obj")]
                        [3] limit 1
                        [3] coscan
                    in
                        [3] project [s20 = fillEmpty (s19, undefined)]
                        [3] traverse s19 s18 s17 {if (s18 <=> s19 < 0, s18, s19)} {}
                        from
                            [3] project [s17 = fillEmpty (getField (s16, "array"), null)]
                            [3] limit 1
                            [3] coscan
                        in
                            [3] project [s18 = s17]
                            [3] limit 1
                            [3] coscan
 
 
 
 
 
    in
        [3] project [s29 = s12]
        [3] limit 1
        [3] coscan
 
in
    [4] mkbson s33 s9 [_id] keep [date = s32] true false
    [4] project [s32 = -61660742400000]
    [4] limit 1
    [4] coscan

The winning plan exited early, but did not yet reach EOF, so the SBE multi-planning code closes and re-opens the plan:

https://github.com/mongodb/mongo/blob/a7a795246a7ba15b36f96337c97333d1cf7f8061/src/mongo/db/query/sbe_multi_planner.cpp#L97-L106

The SBE plan appears to loop infinitely during re-opening. It also looks like the spool sp1, used for the generic index scanning algorithm, is growing unbounded in size. I need to dig more into why this is happening, but my current hypothesis is that closing and re-opening the plan is putting it into some unexpected state, and as a result the stack spool-based index scanning algorithm does not terminate.



 Comments   
Comment by Githook User [ 23/Apr/21 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}

Message: SERVER-56132 Don't use stack spool consumer with eager producer in SBE unit test
Branch: master
https://github.com/mongodb/mongo/commit/4f34fa3e87c808cb06bc0f282ac7a4d05e0afd3c

Comment by Githook User [ 22/Apr/21 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}

Message: SERVER-56132 Fix SBE spool producers' close() methods
Branch: master
https://github.com/mongodb/mongo/commit/ce33b7c241aa957ac2dd3e5b073532b5a9c4ee0e

Comment by Anton Korshunov [ 19/Apr/21 ]

david.storch Not closing a child in close() is clearly a bug. When reOpen is false the assumption was that since the tree is opens for the first time, all PlanStages should have been reset to their initial state and there is no need to clear the buffer. But since we do not clear the buffer on close() either, an attempt to open the tree for the second time, after it was closed, could also result in an incorrect state. So, your proposed fix makes total sense.

Comment by David Storch [ 16/Apr/21 ]

I believe I now understand why the problem described above results in an infinite loop. The failure to properly close and reopen the union stage results in a bad internal state for the union. Namely, UnionStage::_currentStage is a pointer to the second branch of the union which has the spool consumer and the ixscan stage. However, UnionStage::_currentStageIndex has a value of 0. This means that when stages higher in the tree attempt to read slot s2, the output of the union stage, they always end up reading s3 (the output slot for the first branch) rather than s8 (the output slot for the second branch). The value of s3 is always equal to the constant KeyString value KS(0A0104). Consequently, the spool simply fills up with many copies of the value KS(0A0104).

Comment by David Storch [ 16/Apr/21 ]

Aha! I've found the problem. Two problems, actually. The first is that LazySpoolProducerStage::close() fails to close its child stage. This means that when the multi-planning trial period ends with a QueryTrialRunCompleted early exit, the plan is improperly closed. The entire subtree beneath the lspool stage remains open:

                    [1] lspool sp1 [s2] {! isRecordId (s2)}
                    [1] union [s2] [
                        [s3] [1] project [s3 = KS(0A0104)]
                        [1] limit 1
                        [1] coscan ,
                        [s8] [1] nlj [] [s6]
                            left
                                [1] sspool sp1 [s6]
                            right
                                [1] chkbounds s4 s5 s8
                                [1] nlj [] [s7]
                                    left
                                        [1] project [s7 = s6]
                                        [1] limit 1
                                        [1] coscan
                                    right
                                        [1] ixseek s7 s4 s5 [] @"174bb7e1-898e-4b10-bb69-e4b63bd456e2" @"obj.date_1" true

When the tree is subsequently re-opened in order to start post-trial execution, the nlj plan is untouched, and is left with some stale state. This somehow causes the execution of the plan to never terminate. I don't fully understand exactly why yet, but this is definitely the root cause of the infinite loop.

The other bug is that when the plan is closed and re-opened, sp1 is left with the value KS(0A0104) inside it without being cleared. This is because the spool producer stages are only clearing the spool buffer if reOpen is true:

In this case, reOpen is false because the plan is closed and being fully reopened from scratch. I think we should be clearing the buffer on open no matter what the value of reOpen is? Or, should we actually be clearing the buffer when the spool producer is closed?

The following patch appears to resolve the issue:

diff --git a/src/mongo/db/exec/sbe/stages/spool.cpp b/src/mongo/db/exec/sbe/stages/spool.cpp
index caffc5c77d..568c24d9f2 100644
--- a/src/mongo/db/exec/sbe/stages/spool.cpp
+++ b/src/mongo/db/exec/sbe/stages/spool.cpp
@@ -79,9 +79,7 @@ void SpoolEagerProducerStage::open(bool reOpen) {
     _commonStats.opens++;
     _children[0]->open(reOpen);
 
-    if (reOpen) {
-        _buffer->clear();
-    }
+    _buffer->clear();
 
     while (_children[0]->getNext() == PlanState::ADVANCED) {
         value::MaterializedRow vals{_inAccessors.size()};
@@ -219,9 +217,7 @@ void SpoolLazyProducerStage::open(bool reOpen) {
     _commonStats.opens++;
     _children[0]->open(reOpen);
 
-    if (reOpen) {
-        _buffer->clear();
-    }
+    _buffer->clear();
 }
 
 PlanState SpoolLazyProducerStage::getNext() {
@@ -266,6 +262,7 @@ void SpoolLazyProducerStage::close() {
     auto optTimer(getOptTimer(_opCtx));
 
     _commonStats.closes++;
+    _children[0]->close();
 }
 
 std::unique_ptr<PlanStageStats> SpoolLazyProducerStage::getStats(bool includeDebugInfo) const {

I'd like to consult with anton.korshunov regarding the semantics for when the spool buffer should be cleared, and regarding options for testing this fix.

Generated at Thu Feb 08 05:38:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.