[SERVER-61918] SBE tracking of open stages is not exception safe Created: 04/Dec/21  Updated: 27/Oct/23  Resolved: 28/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Ian Boros Assignee: Drew Paroski
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-61422 Update SBE filter stage builder to us... Closed
Related
related to SERVER-67630 [SBE] SortStage is not guaranteed to ... Closed
Assigned Teams:
Query Execution
Operating System: ALL
Sprint: QE 2021-12-27, QE 2022-01-10, QE 2022-02-07, QE 2022-01-24, QE 2023-02-20, QE 2023-03-06, QE 2023-03-20, QE 2023-04-03
Participants:

 Description   

A number of stages track whether their child is open via a _childOpened boolean flag. This flag is usually set after calling open() on the child. For example:

 

    // From FilterStage.
    void open(bool reOpen) final {
        // <Other logic omitted>
        _children[0]->open(reOpen);// This might throw!
        _childOpened = true;
    } 

Unfortunately, a call to open() may throw, in order to abort a trial period. If this happens, the stage's _childOpened flag is never set to true. This means that a call to close() on the parent stage will not result in the child stage being closed.

    // FilterStage.
    void close() final {
        // <Omitted>
        if (_childOpened) {
            // This branch isn't taken if the exception is thrown on open().
            _children[0]->close();
            _childOpened = false;
        }
    } 

In other words, after the exception is thrown, even closing() and opening() the plan leaves the tree in an invalid state. Attempting to use the plan may result in a server crash.

We should do a full audit of all of the SBE stages and make sure this pattern is fixed in every case. Here are the two places I've encountered it:

https://github.com/mongodb/mongo/blob/fbe42b59f77c645413ebb60f6d11df7acf9612ee/src/mongo/db/exec/sbe/stages/filter.h#L98-L99

https://github.com/mongodb/mongo/blob/fbe42b59f77c645413ebb60f6d11df7acf9612ee/src/mongo/db/exec/sbe/stages/hash_agg.cpp#L167-L168

 



 Comments   
Comment by David Storch [ 28/Mar/23 ]

This has gone away. We no longer attempt to reuse an SBE plan that has thrown an exception in any of the runtime planners.

Comment by Anton Korshunov [ 17/Feb/22 ]

kyle.suarez, we will fix it in SERVER-61422.

Comment by Kyle Suarez [ 16/Feb/22 ]

Removing this from the current sprint.

anton.korshunov, do you have a SERVER ticket handy for the work in the SBE Plan Cache project? I'll link this ticket as depending on that one, so we eventually circle back to this in the future.

Comment by Ian Boros [ 07/Dec/21 ]

I don't have a repro, but I think the issue should be clear enough without one. I did run into this while working on a branch though.

Comment by Kyle Suarez [ 07/Dec/21 ]

ian.boros, do you have an example of a crash? We discussed in triage and it sounds like this would be a problem during multiplanning.

Generated at Thu Feb 08 05:53:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.