Make $graphLookup use StageParam subpipeline instead of LPP

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Integration
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Status: Deferred — investigation writeup for future work

      An implementation of this ticket was attempted (patch 6a2964eb4821f300074b8f98) following the SERVER-127884 pattern. CI revealed that the approach produces silent wrong results for nested views, and that the correct migration requires infrastructure that does not exist yet. This comment documents the findings so the future implementation doesn't rediscover them.

      Background

      SERVER-127884 modeled $lookup/$unionWith subpipelines as StageParams: the lite-parsed stage's getStageParams() produces subpipelineStageParams (std::vector<std::unique_ptr<StageParams>>), and the DocumentSource constructor consumes them exactly once via Pipeline::parseFromStageParams() into a built Pipeline stored in shared state. The non-copyable unique_ptr vector is transient — gone by the end of construction.

      This ticket asked for the same treatment of $graphLookup's view subpipeline (GraphLookUpStageParams::liteParsedPipeline, an OwnedLiteParsedPipeline).

      Why $graphLookup is structurally different from $lookup/$unionWith

      $graphLookup cannot consume its subpipeline once at construction. GraphLookUpParams::fromLpp is a durable, copyable, mutable, re-parsed-per-use template:

      1. exec::agg::GraphLookUpStage::makePipeline() (graph_lookup_stage.cpp) — every BFS frontier expansion clones the LPP, appends a per-iteration $match, and re-parses to a fresh Pipeline.
      2. The sharded-view retry path in makePipeline() replaces fromLpp with the resolved view definition from CommandOnShardedViewNotSupportedOnMongod.
      3. serializeToArray() re-parses it to emit $_internalFromPipeline for router→shard dispatch.
      4. addInvolvedCollections() re-parses it to walk involved namespaces.
      5. doInitialize() desugars it and feeds getInvolvedNamespaces() into _fromExpCtx.

      The codebase's existing "cloneable, re-parseable, owns-its-BSON pipeline template" type is exactly OwnedLiteParsedPipeline — which is why fromLpp is shaped the way it is.

      The attempted approach and why it is unfixable as designed

      The attempt replaced the LPP with subpipelineStageParams and had DocumentSourceGraphLookUp::createFromStageParams() rebuild the LPP by extracting each stage's original BSON (StageParams::getOriginalBson()) and re-lite-parsing (OwnedLiteParsedPipeline::fromStageParams() helper).

      CI result: 8 task failures across extensions_ and mongot_e2e_ suites:

      • {
        Unknown macro: {jstests/extensions/extension_in_view_with_graph_lookup.js}

        }

      • {
        Unknown macro: {jstests/extensions/nested_subpipeline_combinations.js}

        }

      • {
        Unknown macro: {jstests/with_mongot/e2e/views/mongot_stage_in_view_definition_graph_lookup.js}

        }

      All fail identically: expected 8 to equal 7 — and only in the nested-view cases. Scenario: $graphLookup whose from is a view defined as [{$skip: N}, {$unionWith: {coll: nestedView, pipeline: []]}}, where nestedView is itself a view filtering through an extension/mongot stage. The inner view's filter is silently dropped — one extra document traversed. No error, no crash: silent wrong results.

      Root cause: nested-view resolution state is applied by the recursive view resolver (pipeline_resolver.cpp::resolveInvolvedNamespacesImpl, via bindViewInfo()/getMutableSubPipelines()) onto the live LiteParsed stage objects. That bound state (e.g. LiteParsedUnionWith's inner-view pipeline) exists only in those objects — it is not present in the stage's original BSON. Rebuilding an LPP from original BSON and re-lite-parsing produces unbound stages, and nothing re-binds them at DocumentSource-construction time. Master works because getStageParams() deep-clones the bound LPP (OwnedLiteParsedPipeline's copy constructor clones each stage, preserving bindings).

      Rejected mitigations:

      • Re-running view binding on the rebuilt LPP inside createFromStageParams() — requires invoking pipeline_resolver recursion (cycle detection via inProgress, mongot special-casing) from the DocumentSource layer. Wrong layer, duplicated machinery, fragile.
      • Carrying both the LPP and the StageParams vector — dead-weight duplication; the StageParams would have no consumer.

      Conclusion: any original-BSON round-trip is semantically lossy for stages whose lite-parse state is mutated after parse. The only correct carrier across the phase boundary is one that preserves the rich per-stage state — i.e. the StageParams objects themselves, consumed natively by the registry, never flattened to BSON.

      Design for the future implementation

      Step 1 — StageParams::clone(). Virtual deep copy returning owned-BSON StageParams. Trivial for DECLARE_STAGE_PARAMS_DERIVED_DEFAULT stages (~45 of them, one macro change); hand-written for LookUpStageParams/UnionWithStageParams/GraphLookUpStageParams (extra fields + recursive subpipelineStageParams). This is what makes a unique_ptr<StageParams> vector copyable and preserves rich state (resolvedSubPipelineView, nested params) across copies — solving the exact problem the BSON round-trip cannot.

      Step 2 — remove fromLpp. Store boost::optional<std::vector<std::unique_ptr<StageParams>>> directly in GraphLookUpParams; hand-written copy ctor clones elements. Rework the use sites:

      • makePipeline(): clone stored StageParams → Pipeline::parseFromStageParams() → append the per-iteration $match as a DocumentSource post-build. Dispatching through the StageParams→DocumentSource registry preserves bound state (this is how $unionWith inside the view pipeline keeps its inner-view resolution — UnionWithStageParams carries it).
      • Sharded-view retry: lite-parse the resolved-view BSON from the exception → getStageParams() → replace the stored vector (parity with today's behavior).
      • serializeToArray() ($_internalFromPipeline) and addInvolvedCollections(): build a throwaway pipeline from cloned StageParams and serialize/walk its sources.
      • doInitialize(): LiteParsedDesugarer::desugar() + getInvolvedNamespaces() are LPP-centric. Either desugar before getStageParams() at lite-parse time and carry the involved-namespace set on the params, or derive it from a built pipeline. This is the part needing the most care.

      Step 3 — merge GraphLookUpParams and GraphLookUpStageParams. They are near-duplicates; after steps 1–2 the only irreducible difference is startWith (raw BSONElement at lite-parse vs parsed Expression, which needs an ExpressionContext). The merged type keeps startWith raw; DocumentSourceGraphLookUp parses it into a separate member in its constructor; required-field validation moves there too. createFromStageParams() shrinks to validate + parse startWith + construct.

      Groundwork that is independently landable (already implemented and green)

      • Collapse DefaultStageParams into StageParams: base gains a default (EOO) ctor, explicit StageParams(BSONElement), and a concrete getOriginalBson(). Extension params (ExpandableStageParams/ExpandedStageParams) and InternalSearchIdLookupStageParams use the EOO default; ScoreFusion/RankFusion/IDRMS params pass their owned BSON to the base. Removes the dynamic_cast<DefaultStageParams*> pattern.
      • SERVER-121262 owned-BSON pattern in GraphLookUpStageParams (BSONObj ctor param + _ownedOriginalBson member).
      • Test fix: pipeline_length_limit.js referenced nonexistent ErrorCodes.MaxPipelineLengthExceeded; replaced with numeric 12788402 (unique numeric codes per review guidance).

      Regression coverage

      The future implementation's canary tests are exactly the ones that caught this:
      jstests/extensions/extension_in_view_with_graph_lookup.js ("nested view with extension stage in subpipeline" case), jstests/extensions/nested_subpipeline_combinations.js, and jstests/with_mongot/e2e/views/mongot_stage_in_view_definition_graph_lookup.js. Any approach that flattens the subpipeline to BSON will fail them with off-by-one connection counts.

      Error codes for this ticket use the 127906XX range (verify uniqueness against both branch and patch base before use).

            Assignee:
            Unassigned
            Reporter:
            Finley Lau
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: