Improve sampling CE performance in the presence of large $in lists

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Blocker - P1
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Sampling performance suffers as the $in list size increases. I tested up to an array of 50k elements and sampling takes nearly 700ms compared to 20ms with multiplanning. The degradation appears to be linear as shown in the charts. this is a set of queries, numbered along the bottom. each query has a conjunctive predicate on mark and salary field. mark predicate is a $in list and salary predicate is a range. each query is run 4 times. Once each with the mark and salary index hinted. Once each with no hint using samplingCE (red line) and multi-planning (green line). We can see the samplingCE perf is much slower than multi-planning and the flame graph shows a significant amount of time spent in sampling CE code. Also attached a flame graph.

        1. screenshot-1.png
          screenshot-1.png
          124 kB
        2. server108776-5000.svg
          111 kB
        3. server108776-50000.svg
          118 kB
        4. multiplanner times.png
          multiplanner times.png
          54 kB
        5. samplingce times.png
          samplingce times.png
          57 kB

            Assignee:
            Unassigned
            Reporter:
            Jess Balint
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: