Remove recursion from view resolution step in run_aggregate

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Integration
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Even though we treat and speak of view resolution as a discrete step in the sequence of executing an aggregation, the implementation works recursively, with these steps:

      1. enter runAggregate()
      2. perform some initial operations/validation
      3. acquire information about the namespace from the catalog
      4. if the namespace is a view, go back to step (1) with the resolved view and new underlying namespace
      5. parse/prepare/optimize the pipeline
      6. convert pipelines to executors and execute the query

      We resolve the view entirely at once (meaning if there are nested views, they're all resolved at once), so the max recursion depth is 1. That means we have the opportunity to remove the recursion and instead refactor the code in a way that mirrors the way we speak about view resolution as a discrete step:

      1. enter runAggregate()
      2. perform some initial operations/validation
      3. acquire information about the namespace from the catalog
      4. if the namespace is a view:
        1. resolve the view and get underlying namespace
        2. perform whichever operations/validations should be duplicated from (2) with the resolved pipeline
        3. acquire information about the underlying namespace from the catalog
      5. parse/prepare/optimize the pipeline
      6. convert pipelines to executors and execute the query

       

      Why?

      The primary benefits I see here are (1) having the implementation match the way we speak about view resolution as a discrete step and as a byproduct (2) being very clear about what operations do or don't need to be repeated once the view is resolved. Right now, any code placed at the top of runAggregate() will be run twice; but we should instead be intentional about which code actually needs to be run again after the view is resolved.

      This change would not have been easy prior to SERVER-82228, but now that we have the AggregationExecutionContext tracking the agg state, this change is much more doable. I've posted a quick incomplete PoC here. Note the PoC assumes a chunk of SERVER-93536 will be done first too.

            Assignee:
            Unassigned
            Reporter:
            Will Buerger
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: