Make expensive graph dependency analysis logic conditional

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The pipeline dependency graph eagerly computes field-level and stage-level path dependencies.

      This accounts for 65% of time spent building the graph in some tests (profiling PipelineOptimizationBMFixture/BM_BuildDependencyGraph/1000), due to the various APIs we need to call - DocumentSource::getDependencies, Expression::getDependencies.

      However, we don't need this information - it is likely often computed, discarded and re-computed when the graph resizes.

      We only need the dependency information for:

       - getDeadFields - for dead code elimination

       - SERVER-127536 - Pipeline::getDependencies replacement

      We can create a fast-path in the graph, by specifying whether we care about field-level dependencies. If we don't, we can bypass the DepsTracker and use a conservative (with RNG + wholeDoc deps) constant FieldDependencies.

      In this way, we will avoid tracking dependencies during normal pipeline rewrites which might not care about them.

      We should only make this change if it actually appears that graph building is too slow and we haven't added other code which requires the field-level deps.

            Assignee:
            Unassigned
            Reporter:
            Vesko Karaganev
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: