Improve sampling CE performance in the presence of large $in lists

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Sampling performance suffers as the $in list size increases. I tested up to an array of 50k elements and sampling takes nearly 700ms compared to 20m with multiplanning. The degradation appears to be linear as shown in the charts. this is a set of queries, numbered along the bottom. each query has a conjunctive predicate on mark and salary field. mark predicate is a $in list and salary predicate is a range. each query is run 4 times. once each with the mark and salary index hinted. once each with no hint using samplingCE (red line) and multi-planning (green line). we can see the samplingCE perf is much slower than multi-planning and the flame graph shows a significant amount of time spent in sampling CE code. Also attached a flame graph.

        1. screenshot-1.png
          screenshot-1.png
          124 kB
        2. flamegraph-png.png
          flamegraph-png.png
          177 kB

            Assignee:
            Unassigned
            Reporter:
            Jess Balint
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: