Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-60608

Avoid re-materializing the entire document between filter/project/sort/limit/skip and group in common cases

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Do
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Execution

      We should be able to eliminate mkbson stage(s) to avoid unnecessary materialization when multiple pipeline stages are pushed down together to SBE.

      As of now, we were able to avoid mkbson stage on the direct top of a collection scan (SERVER-60101) and a mkbson stage between a group and a group (SERVER-60484).

      We need to generalize the idea so that it can be applied to other plan shape too.

      Ian's thought on a project stage:

      Another thing I was just thinking we'll need is a way for a stage to indicate that a field "might" exist but has to be looked up in a document. For example if we have exclusion projection {a: 0, b: 0} and a subsequent stage needs field "c", it will need to know that it has to look it up in an object somewhere. At the same time, if a subsequent stage needs field "a" it should be able to find out a query compile time that "a" does not exist. How to represent all of this in the stage builder, I'm not sure yet. Maybe each field maps to a std::variant or something like that. I'm just spitballing here though.

      Update at 3pm GMT 10/12/2021:
      The scope of the optimization in this ticket is quite limited as one can expect and it is only applied to top-level field paths. What it does is that we walk through group-by expression(s) and accumulation expressions and whenever we found a top-level field path, we push it down to a collection scan. Other than that, we just rely on generateExpression generating traverse trees. And we will eliminate intermediary mkbson stages when $group or $project refers to only top-level fields and then ask children to not return mkbson object. Right not, the current work is more limited than this proposed scenario and so we wanted to extend its applicability a little bit more.

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            yoonsoo.kim@mongodb.com Yoon Soo Kim (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: