[SERVER-80576] Microbenchmarks - investigate regression in $in queries Created: 31/Aug/23 Updated: 31/Jan/24 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Anton Korshunov | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | M7 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
Specifically, in the following workloads: Q.UnindexedVeryLargeInSortedMatching |
| Comments |
| Comment by Timour Katchaounov [ 19/Jan/24 ] | |||||||||||||||||||||||||||||||||
|
The problem with slow optimization is solved by constant pooling, as described in SERVER-85436 and the design doc
The problem with slow query execution is solved by query cache parametrization. It is not clear why the non-parameterized plan is so much slower. | |||||||||||||||||||||||||||||||||
| Comment by Timour Katchaounov [ 14/Dec/23 ] | |||||||||||||||||||||||||||||||||
|
This is superseded by the work on skipping sargable node. | |||||||||||||||||||||||||||||||||
| Comment by Timour Katchaounov [ 12/Oct/23 ] | |||||||||||||||||||||||||||||||||
|
Profiling and analysis of the remaining overhead in addition tothe filter->sargable conversion. These benchmarks and profiles were run after disabling the Filter->SargableNode rewrite.
Linux perf/FlameGraph analysis of the perfromance profile without the filter->sargable conversion, based on running a single test "Queries.UnindexedVeryLargeInSortedMatching"
Intel VTune analysis based on running a unit test with 1000 sequential (sorted) $in elements, directly generated as an ABT.
General conclusions:The Filter->SargableNode conversion (mostly interval simplification) is the major performance problem for large $in, however, even if it is infinitely fast, there is still substantial overhead (50% worse qps than classic). This additional overhead is due to copying the $in list a few extra times:
In addition there seems to be extra overhead in OptPhaseManager::runMemoPhysicalRewrite which calls:
It is not clear how much is this overhead because some profiling approaches do not show it as a primary problem. Depending on the profiling method different stage seems to be the primary CPU consumer.
| |||||||||||||||||||||||||||||||||
| Comment by Timour Katchaounov [ 05/Oct/23 ] | |||||||||||||||||||||||||||||||||
|
A performance investigation based on a unit test of an $in with 1000 elements in increasing order shows that the majority of the time is spent in function unionDNFIntervals as follows:
The analysis was performed using the Intel VTune profiler.
The reasons for these high processing costs are:
There are several approaches to improve the performance of large $in queries:
|