Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-88049

Consider using tournament tree for MergeIterator::advance

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Execution

      From https://en.wikipedia.org/wiki/K-way_merge_algorithm#Direct_k-way_merge

      "The heap is more commonly used, although a tournament tree is faster in practice. A heap uses approximately 2*log(k) comparisons in each step because it handles the tree from the root down to the bottom and needs to compare both children of each node. Meanwhile, a tournament tree only needs log(k) comparisons because it starts on the bottom of the tree and works up to the root, only making a single comparison in each layer. The tournament tree should therefore be the preferred implementation."

      The scope of this task is to determine how much benefit this could offer. What workloads hit this codepath and what percentage of time is spent in the k-way merge? Consider flamegraphing index creation on the bestbuy agg workload, for instance. Note that since external sort is a two-phase process, make sure to collect samples during the merge phase, not the sort phase. Attached is a flamegraph collected during the sort phase. From what I can tell, after the sort and spill phase completes, the spilled files will run through the k-way sort, which currently uses a heap. See the mongo::SortedDataIndexAccessMethod::BulkBuilderImpl callstack.

      Also could consider read queries that perform a sort on data that doesn't fit into memory. 
      building-index-for-bestbuy-agg-high-value-workload.svg

      two-mins-building-index-bestbuy_agg.svgwas collected over 2 minutes during the whole execution time of mongorestore-ing bestbuy data. Scanning the collection dominates the execution time. The MergeIterator::advance is about 3.5% of the samples underneath mongo::IndexBuildsCoordinator::_buildIndex.

      If the tournament tree solution could speed up advance by a factor of 2, we'd expect to see roughly a 1.75% speedup on this particular index build. Worth noting this may be a unique index build, it would be worth looking at unique and non-unique index builds separately.

      On building an index on 32 million {a:2} documents, MergeIterator::advance takes about 4.25% of the samples. Again, assuming a tournament tree solution would provide 2x speedup on MergeIterator::advance, this would be a bit over 2% speedup in this case as well.

      SERVER-60331 keeps the number of sorted runs low, which limits how much effect the suggested change would have.

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            evan.bergeron@mongodb.com Evan Bergeron
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: