[SERVER-68520] Investigate performance of push down of low-selectivity per-path filters Created: 02/Aug/22  Updated: 19/Oct/22  Resolved: 19/Oct/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Irina Yatsenko (Inactive) Assignee: Irina Yatsenko (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Participants:

 Description   

Zig-zag search naturally favors high-selectivity filters, however, if all filters are low selectivity, the perf of zig-zag search might be worse than iterating over the filtered columns in parallel. We should investigate how bad it becomes and whether we need to take any mitigating actions (e.g. replacing short-distance seeks with iterating either inside the stage or at the storage level).

From my experiments on small datasets, the cost of a "seek" is about twice the cost of a "next" which means that the filters have to have really low selectivity to benefit from replacing seeks with next (on a collection with 10^5 docs, a query with 7 always true filters and one filter that let through 30% of records performed about the same with seeks and nexts). The seeks would likely be more relatively expensive on indexes that don't fit into memory but we'd still have to use heuristics in the column_scan stage to pick between the two, such as the distance between the record ids, and I doubt the heuristics can be reliable enough to be useful. 



 Comments   
Comment by Irina Yatsenko (Inactive) [ 19/Oct/22 ]

Per the completed investigation have decided not to optimize "seeks" vs "nexts" at this time.

Generated at Thu Feb 08 06:11:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.