Provide Dependency Graph API for richer field provenance tracking

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Context

      Join ordering currently uses this PathResolver class which resolves field origin for join predicates.

      https://github.com/10gen/mongo/blob/8df5e4232558041076a85d9d8ec676f43342ff2b/src/mongo/db/query/compiler/optimizer/join/path_resolver.h#L59

      It currently uses DependencyGraph::getPrevModifyingStage to check if fields originate from a $lookup or collection base field or other stage.

      For $lookup, the PathResolver does some path prefix operations to determine the in-$lookup pipeline name and then calls into the graph again.

      Problem

      There is a desire to also allow for non-reshaping renames to also work with join ordering.

      Categorising renames as non-reshaping requires knowledge of arrayness of the path prefix and various other checks. Those are already done by the dependency graph internally (for arrayness and constant propagation). We shouldn't require callers to reason about those separately.

      Solution

      We can add a method such as:

      enum class FieldOriginKind {
        kBaseCollection,
        kAlias,
        kComputed,
        kSubpipeline,
      };
      struct FieldOrigin {
        FieldOriginKind kind;
        const DocumentSource* stage;
        FieldPath name;
      };
      
      // Resolves the field to it's origin.
      FieldOrigin resolveFieldOrigin(const DocumentSource* stage, PathRef path) const;
      

      An API such as the above can provide not only the modifying stage of a field, but also the kind of modification and the previous known name, for kAlias and kSubpipeline.

      The caller can then simply call it again to resolve another hop:

      if (origin.kind == kAlias) {
        origin = graph->resolveFieldOrigin(stage, origin.name);
      }
      

            Assignee:
            Vesko Karaganev
            Reporter:
            Vesko Karaganev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: