[SERVER-53656] Execution stats level explain of aggregate command segfaults when SBE is enabled Created: 08/Jan/21 Updated: 29/Oct/23 Resolved: 11/Jan/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework, Querying |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Storch | Assignee: | David Storch |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
|||||||||||||||||
| Backwards Compatibility: | Fully Compatible | |||||||||||||||||
| Operating System: | ALL | |||||||||||||||||
| Steps To Reproduce: |
|
|||||||||||||||||
| Sprint: | Query 2021-01-11, Query 2021-01-25 | |||||||||||||||||
| Participants: | ||||||||||||||||||
| Linked BF Score: | 7 | |||||||||||||||||
| Description |
|
See "Steps to Reproduce". I haven't yet dug into the details of why this crash recurs, but it can be reproduced trivially with the short provided script. |
| Comments |
| Comment by Githook User [ 11/Jan/21 ] |
|
Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}Message: |
| Comment by David Storch [ 08/Jan/21 ] |
|
This is a use-after-free bug. I've also discovered that it affects pretty much any "executionStats" or "allPlansExecution" explain of an aggregate operation when both 1) the slot-based execution engine is enabled, and 2) the DocumentSource portion of the execution machinery cannot be optimized away, leaving a $cursor stage in the plan. The bug relates to the fact that DocumentSourceCursor disposes but does not free the underlying PlanExecutor for explain operations once it finishes executing the query. The SBE implementation of PlanExecutor::dispose() frees its execution tree. The PlanExplainerSBE retains an unowned pointer to this now-freed execution plan, and attempts to make use of it in order to produce explain output. The use of _root in this line of code is precisely where the segfault occurs. I believe that a similar problem does not occur for the classic engine because its dispose() implementation does not actually cleanup the underlying PlanStage tree. Perhaps SBE should behave in the same fashion, and should not delete the tree in dispose()? |