Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-82566

Try to avoid call to cluster_aggregation_planner::getCollationAndUUID

    • Type: Icon: Task Task
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Integration

      In a sharded environment, the throughput of an aggregation {$match: {_id: x}} is slower than the throughput of the equivalent find command by a factor of about 25X. I've attached flamegraphs to compare those 2 workloads.

      In the agg case, 91% of the time is spent in cluster_aggregation_planner::getCollationAndUUID since it has to execute a remote call to the primary shard to retrieve that metadata. We believe that, at least in cases when we're only parsing the pipeline for the sake of query stats, the uuid is optional and the collation can be a default empty object. We should try to avoid that call so the cluster_aggregate IDHack isn't bottlenecked there.

        1. flamegraph_agg_idhack.svg
          1.23 MB
          Will Buerger
        2. flamegraph_find_idhack.svg
          2.19 MB
          Will Buerger

            Assignee:
            backlog-query-integration [DO NOT USE] Backlog - Query Integration
            Reporter:
            will.buerger@mongodb.com Will Buerger
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: