Improve sampling CE performance in the presence of large $in lists

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Blocker - P1
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Sampling performance suffers as the $in list size increases. I tested up to an array of 50k elements and sampling takes nearly 700ms compared to 20ms with multiplanning. The degradation appears to be linear as shown in the charts. this is a set of queries, numbered along the bottom. each query has a conjunctive predicate on mark and salary field. mark predicate is a $in list and salary predicate is a range. each query is run 4 times. Once each with the mark and salary index hinted. Once each with no hint using samplingCE (red line) and multi-planning (green line). We can see the samplingCE perf is much slower than multi-planning and the flame graph shows a significant amount of time spent in sampling CE code. Also attached a flame graph.

        1. multiplanner times.png
          54 kB
          Andi Wang
        2. samplingce times.png
          57 kB
          Andi Wang
        3. screenshot-1.png
          124 kB
          Jess Balint
        4. server108776-5000.svg
          111 kB
          Andi Wang
        5. server108776-50000.svg
          118 kB
          Andi Wang

            Assignee:
            Unassigned
            Reporter:
            Jess Balint
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: