[SERVER-74266] Run existing genny workloads in catalog shard mode as a one-off patch to evaluate any significiant performance deviations. Created: 22/Feb/23 Updated: 12/Jun/23 Resolved: 12/Jun/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Kshitij Gupta | Assignee: | Wenqin Ye |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Sharding NYC
|
||||||||
| Sprint: | Sharding NYC 2023-04-17, Sharding NYC 2023-05-01, Sharding NYC 2023-05-15, Sharding NYC 2023-05-29, Sharding NYC 2023-06-12, Sharding NYC 2023-06-26 | ||||||||
| Participants: | |||||||||
| Comments |
| Comment by Wenqin Ye [ 12/Jun/23 ] |
|
After some initial investigations we were not able to figure out whether or not the regressions in the $lookup and $graphLookup workloads are real. I have created SERVER-77983 to investigate further and will close this ticket so we can close out PM-2290. |
| Comment by Jack Mulrow [ 08/Jun/23 ] |
|
An update on this, Wenqin ran our genny tests after identifying setup differences in the config server and compiled the results of 3 runs here. There's a lot of variance between runs, and the only statistically significant regressions seem to be in the $lookup and $graphLookup workloads where the foreign collection is sharded. We're looking more into if that is a real regression (it could be yet another setup difference or assumptions the workload makes), but the good news is everything else seems to show no regression. |
| Comment by Garaudy Etienne [ 26/May/23 ] |
|
According to these test results, resharding ain't the only problem. Looks like regressions everywhere lol cc ratika.gandhi@mongodb.com |
| Comment by Wenqin Ye [ 13/Apr/23 ] |
|
Link to the workload patch: https://spruce.mongodb.com/version/64346dd05623438f7351377a/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC |
| Comment by Wenqin Ye [ 13/Apr/23 ] |
|
Mini-writeup on the results from the genny workload: https://docs.google.com/document/d/1eAioYLvz_haug2RkRx-5e9Dlscw-nT8xQ70WNWmkSag/edit# Main issue we found was large regressions in the resharding workload for the 90th and 99th percentile latencies for reads/writes when compared to the baseline. I created a ticket for someone with more knowledge on resharding to investigate: |