Try to avoid call to cluster_aggregation_planner::getCollationAndUUID

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Duplicate
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Integration
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      In a sharded environment, the throughput of an aggregation {$match: {_id: x}} is slower than the throughput of the equivalent find command by a factor of about 25X. I've attached flamegraphs to compare those 2 workloads.

      In the agg case, 91% of the time is spent in cluster_aggregation_planner::getCollationAndUUID since it has to execute a remote call to the primary shard to retrieve that metadata. We believe that, at least in cases when we're only parsing the pipeline for the sake of query stats, the uuid is optional and the collation can be a default empty object. We should try to avoid that call so the cluster_aggregate IDHack isn't bottlenecked there.

        1. flamegraph_agg_idhack.svg
          1.23 MB
        2. flamegraph_find_idhack.svg
          2.19 MB

              Assignee:
              [DO NOT USE] Backlog - Query Integration
              Reporter:
              Will Buerger
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: