[SERVER-53656] Execution stats level explain of aggregate command segfaults when SBE is enabled Created: 08/Jan/21  Updated: 29/Oct/23  Resolved: 11/Jan/21

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Querying
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

 ➜ cat repro.js
(function() {
"use strict";
 
const coll = db.explain_sample;
coll.drop();
 
// Turn on SBE.
assert.commandWorked(
    db.adminCommand({setParameter: 1, internalQueryEnableSlotBasedExecutionEngine: true}));
 
assert.commandWorked(coll.insert({a: 1}));
 
// Explain a $sample pipeline.
printjson(coll.explain("allPlansExecution").aggregate([{$sample: {size: 10}}]));
}());
 ➜ python3 buildscripts/resmoke.py run --suites=core repro.js

Sprint: Query 2021-01-11, Query 2021-01-25
Participants:
Linked BF Score: 7

 Description   

See "Steps to Reproduce". I haven't yet dug into the details of why this crash recurs, but it can be reproduced trivially with the short provided script.



 Comments   
Comment by Githook User [ 11/Jan/21 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}

Message: SERVER-53656 Fix use-after-free in SBE with agg executionStats explain
Branch: master
https://github.com/mongodb/mongo/commit/74bd745cb749d4275777da2e2ab9edd688ded292

Comment by David Storch [ 08/Jan/21 ]

This is a use-after-free bug. I've also discovered that it affects pretty much any "executionStats" or "allPlansExecution" explain of an aggregate operation when both 1) the slot-based execution engine is enabled, and 2) the DocumentSource portion of the execution machinery cannot be optimized away, leaving a $cursor stage in the plan.

The bug relates to the fact that DocumentSourceCursor disposes but does not free the underlying PlanExecutor for explain operations once it finishes executing the query. The SBE implementation of PlanExecutor::dispose() frees its execution tree. The PlanExplainerSBE retains an unowned pointer to this now-freed execution plan, and attempts to make use of it in order to produce explain output. The use of _root in this line of code is precisely where the segfault occurs.

I believe that a similar problem does not occur for the classic engine because its dispose() implementation does not actually cleanup the underlying PlanStage tree. Perhaps SBE should behave in the same fashion, and should not delete the tree in dispose()?

Generated at Thu Feb 08 05:31:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.