[SERVER-74266] Run existing genny workloads in catalog shard mode as a one-off patch to evaluate any significiant performance deviations. Created: 22/Feb/23  Updated: 12/Jun/23  Resolved: 12/Jun/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Kshitij Gupta Assignee: Wenqin Ye
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-77983 Investigate performance regressions i... Backlog
Assigned Teams:
Sharding NYC
Sprint: Sharding NYC 2023-04-17, Sharding NYC 2023-05-01, Sharding NYC 2023-05-15, Sharding NYC 2023-05-29, Sharding NYC 2023-06-12, Sharding NYC 2023-06-26
Participants:

 Comments   
Comment by Wenqin Ye [ 12/Jun/23 ]

After some initial investigations we were not able to figure out whether or not the regressions in the $lookup and $graphLookup workloads are real. I have created SERVER-77983 to investigate further and will close this ticket so we can close out PM-2290.  

Comment by Jack Mulrow [ 08/Jun/23 ]

An update on this, Wenqin ran our genny tests after identifying setup differences in the config server and compiled the results of 3 runs here. There's a lot of variance between runs, and the only statistically significant regressions seem to be in the $lookup and $graphLookup workloads where the foreign collection is sharded.

We're looking more into if that is a real regression (it could be yet another setup difference or assumptions the workload makes), but the good news is everything else seems to show no regression.

Comment by Garaudy Etienne [ 26/May/23 ]

According to these test results, resharding ain't the only problem. Looks like regressions everywhere lol cc ratika.gandhi@mongodb.com

Comment by Wenqin Ye [ 13/Apr/23 ]

Link to the workload patch: https://spruce.mongodb.com/version/64346dd05623438f7351377a/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC

Comment by Wenqin Ye [ 13/Apr/23 ]

Mini-writeup on the results from the genny workload: https://docs.google.com/document/d/1eAioYLvz_haug2RkRx-5e9Dlscw-nT8xQ70WNWmkSag/edit#

Main issue we found was large regressions in the resharding workload for the 90th and 99th percentile latencies for reads/writes when compared to the baseline. I created a ticket for someone with more knowledge on resharding to investigate: SERVER-76076

Generated at Thu Feb 08 06:26:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.