[SERVER-60608] Avoid re-materializing the entire document between filter/project/sort/limit/skip and group in common cases Created: 11/Oct/21  Updated: 06/Dec/22  Resolved: 18/Jan/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Yoon Soo Kim Assignee: Backlog - Query Execution
Resolution: Won't Do Votes: 0
Labels: sbe
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Participants:

 Description   

We should be able to eliminate mkbson stage(s) to avoid unnecessary materialization when multiple pipeline stages are pushed down together to SBE.

As of now, we were able to avoid mkbson stage on the direct top of a collection scan (SERVER-60101) and a mkbson stage between a group and a group (SERVER-60484).

We need to generalize the idea so that it can be applied to other plan shape too.

Ian's thought on a project stage:

Another thing I was just thinking we'll need is a way for a stage to indicate that a field "might" exist but has to be looked up in a document. For example if we have exclusion projection {a: 0, b: 0} and a subsequent stage needs field "c", it will need to know that it has to look it up in an object somewhere. At the same time, if a subsequent stage needs field "a" it should be able to find out a query compile time that "a" does not exist. How to represent all of this in the stage builder, I'm not sure yet. Maybe each field maps to a std::variant or something like that. I'm just spitballing here though.

Update at 3pm GMT 10/12/2021:
The scope of the optimization in this ticket is quite limited as one can expect and it is only applied to top-level field paths. What it does is that we walk through group-by expression(s) and accumulation expressions and whenever we found a top-level field path, we push it down to a collection scan. Other than that, we just rely on generateExpression generating traverse trees. And we will eliminate intermediary mkbson stages when $group or $project refers to only top-level fields and then ask children to not return mkbson object. Right not, the current work is more limited than this proposed scenario and so we wanted to extend its applicability a little bit more.



 Comments   
Comment by Ethan Zhang (Inactive) [ 18/Jan/22 ]

This was removed from PM-2267.

Comment by Ian Boros [ 12/Oct/21 ]

yoonsoo.kim Ah, I meant my comment was written at the same time you were updating the description. I didn't touch the description myself.

Comment by Yoon Soo Kim [ 12/Oct/21 ]

ian.boros, I don't see and didn't see your updates. Could you update the description if there's another missing information?

Comment by Yoon Soo Kim [ 11/Oct/21 ]

In response to svilen.mihaylov and pawel.terlecki's opinions, changed the status to "needs scheduling"

Comment by Pawel Terlecki [ 11/Oct/21 ]

+1 i feel we want to keep the effort relatively small even if we cannot get all the performance. The query optimizer will help with that.

Generated at Thu Feb 08 05:50:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.