Investigate performance of push down of low-selectivity per-path filters

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Done
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Fully Compatible
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      Zig-zag search naturally favors high-selectivity filters, however, if all filters are low selectivity, the perf of zig-zag search might be worse than iterating over the filtered columns in parallel. We should investigate how bad it becomes and whether we need to take any mitigating actions (e.g. replacing short-distance seeks with iterating either inside the stage or at the storage level).

      From my experiments on small datasets, the cost of a "seek" is about twice the cost of a "next" which means that the filters have to have really low selectivity to benefit from replacing seeks with next (on a collection with 10^5 docs, a query with 7 always true filters and one filter that let through 30% of records performed about the same with seeks and nexts). The seeks would likely be more relatively expensive on indexes that don't fit into memory but we'd still have to use heuristics in the column_scan stage to pick between the two, such as the distance between the record ids, and I doubt the heuristics can be reliable enough to be useful. 

            Assignee:
            Irina Yatsenko (Inactive)
            Reporter:
            Irina Yatsenko (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: