[SERVER-61918] SBE tracking of open stages is not exception safe Created: 04/Dec/21 Updated: 27/Oct/23 Resolved: 28/Mar/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Ian Boros | Assignee: | Drew Paroski |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Query Execution
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | QE 2021-12-27, QE 2022-01-10, QE 2022-02-07, QE 2022-01-24, QE 2023-02-20, QE 2023-03-06, QE 2023-03-20, QE 2023-04-03 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
A number of stages track whether their child is open via a _childOpened boolean flag. This flag is usually set after calling open() on the child. For example:
Unfortunately, a call to open() may throw, in order to abort a trial period. If this happens, the stage's _childOpened flag is never set to true. This means that a call to close() on the parent stage will not result in the child stage being closed.
In other words, after the exception is thrown, even closing() and opening() the plan leaves the tree in an invalid state. Attempting to use the plan may result in a server crash. We should do a full audit of all of the SBE stages and make sure this pattern is fixed in every case. Here are the two places I've encountered it:
|
| Comments |
| Comment by David Storch [ 28/Mar/23 ] |
|
This has gone away. We no longer attempt to reuse an SBE plan that has thrown an exception in any of the runtime planners. |
| Comment by Anton Korshunov [ 17/Feb/22 ] |
|
kyle.suarez, we will fix it in |
| Comment by Kyle Suarez [ 16/Feb/22 ] |
|
Removing this from the current sprint. anton.korshunov, do you have a SERVER ticket handy for the work in the SBE Plan Cache project? I'll link this ticket as depending on that one, so we eventually circle back to this in the future. |
| Comment by Ian Boros [ 07/Dec/21 ] |
|
I don't have a repro, but I think the issue should be clear enough without one. I did run into this while working on a branch though. |
| Comment by Kyle Suarez [ 07/Dec/21 ] |
|
ian.boros, do you have an example of a crash? We discussed in triage and it sounds like this would be a problem during multiplanning. |