[SERVER-76076] Investigate latency regressions in resharding workload while in catalog shard mode Created: 13/Apr/23  Updated: 27/Oct/23  Resolved: 07/Jun/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Wenqin Ye Assignee: Wenqin Ye
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Sharding NYC
Sprint: Sharding NYC 2023-06-12
Participants:

 Description   

As part of SERVER-74266 we ran a genny workload patch with catalog shard mode on for the `shard` and `shard-single` suites. The results of the patch can be found here: https://spruce.mongodb.com/version/64346dd05623438f7351377a/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC

Using the performance analyzer tool to analyze the patch, we found that the reshard_collection_mixed workload had the largest regressions in terms of latencies. More specifically there were significant regressions in the 99th percentile latencies for reads and writes to a collection being resharded. 

The following is a mini-writeup with more details on the genny workload patch and the specific regressions noticed: https://docs.google.com/document/d/1eAioYLvz_haug2RkRx-5e9Dlscw-nT8xQ70WNWmkSag/edit?usp=sharing

Someone with knowledge on resharding should investigate these regressions and see if there is anything that needs to be fixed or improved. 


Generated at Thu Feb 08 06:31:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.