[SERVER-78353] [CQF] Investigate reference tracker stack-overflow Created: 22/Jun/23  Updated: 29/Aug/23  Resolved: 29/Aug/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Matt Boros Assignee: David Percy
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-78354 [CQF] Implement transport infrastruct... Closed
Related
related to SERVER-62509 Write tests to stress ABT and Bonsai Closed
related to SERVER-80483 [CQF] Split last-ref analysis into a ... Closed
Assigned Teams:
Query Optimization
Sprint: QO 2023-07-24, QO 2023-08-07, QO 2023-08-21, QO 2023-09-04
Participants:

 Description   

The maximum number of aggregation stages allowed is 1000 for non-debug builds. For the following pipeline (and other long pipelines) the classic engine has no issues, while Bonsai fails with a stack overflow in the first call to the reference tracker.

db.c.drop();
const stage = {$addFields: {a: {$add: ["$a", "$b"]}}};
// Make a pipeline with 990 stages (all the same stage)
const pipeline = [...Array(990).keys()].map(_ => stage);
db.c.aggregate(pipeline).toArray();
// Stack overflow

We should investigate why the stack frames are so large. Are we allocating unnecessarily at all?

The overall goal is to determine if the transport infrastructure needs to fundamentally change to be able to handle deep trees, or is there a less extreme change we can make?



 Comments   
Comment by David Percy [ 29/Aug/23 ]

After some discussion we've decided to:

  • Try changing the implementation of algebra::transport to handle trees of any depth: SERVER-78354. As long as we can preserve the interface exactly.
  • If that doesn't work out or is not worth doing right now, change the reftracker transport to use unique_ptr. It may be wasteful to allocate those temp objects but it would definitely fix the stack overflow, and we can come back to it if necessary.
Comment by Matt Boros [ 28/Jun/23 ]

I've found that this is also an issue for M2 queries. Repeating {$project: {a: 1}} one thousand times also causes stack overflow in the reference tracker.

Comment by Matt Boros [ 22/Jun/23 ]

Note that this failure occurs in the first call to the reference tracker, before any rewrites or real work is done by the optimizer. We may fix this issue with the reference tracker and still have overflow issues with Bonsai for this query.

Generated at Thu Feb 08 06:38:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.