[SERVER-64185] Investigate performance regression of $lookup and $graphLookup in genny workloads Created: 03/Mar/22 Updated: 27/Oct/23 Resolved: 15/Sep/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Hana Pearlman | Assignee: | David Storch |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | pm2697-m3, sbe | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Sprint: | QE 2022-09-19 | ||||
| Participants: | |||||
| Linked BF Score: | 27 | ||||
| Description |
|
There is a 15-35% regression in $lookup and $graphLookup genny workloads running with unsharded collections that was first seen between v5.0 and v5.1. The workloads in question are here: $lookup and $graphLookup. The regression can be seen in the linked BF of when looking at the sys-perf waterfall for v5.0 and v5.1 (select average latency for "RunGraphLookups.GraphLookupUnshardedToUnshardedOneToMany", for example). Some ideas have been proposed as to why the regression occurred and what can be done to address it. For example, it may have something to do with slow collection scans (the workloads in question use small collections). It may be that the plan cache project, particularly A more detailed write-up can be found in the comments. |
| Comments |
| Comment by David Storch [ 13/Sep/22 ] | |||||||||||||||||||
|
It looks like the reason for the AutoRun problem I mentioned above may be that schedule_patch_auto_tasks and schedule_variant_auto_tasks are not specified for the "linux-1-node-replSet-classic-query-engine" build variant. I'm testing that the following patch, when combined with the in-progress changes from
| |||||||||||||||||||
| Comment by David Storch [ 13/Sep/22 ] | |||||||||||||||||||
|
I took a brief look using the Evergreen UI at a recent run of these benchmarks in master compared to a recent run in 5.0. In both cases I used the "release configuration" which means that we should be using the classic engine. This assumes that the $lookup queries in these benchmarks are not eligible for SBE pushdown, which indeed appears to be the case because they specify the pipeline option. Also note that using SBE on the inner side of a $lookup or $graphLookup is no longer permitted in the unsharded case due to the changes in As a final step, I think we should verify that these benchmarks due not regress when featureFlagSbeFull is enabled. We have a system which automatically generates an SBE vs. classic performance comparison every Thursday, but unfortunately the data for these benchmarks appears to be missing from the data set. It looks like this is because the benchmarks are not currently running in either the "all feature flags" or "classic engine" build variants – I'll have to look into why the AutoRun configuration for the workload is not behaving as I would expect it to. |