[SERVER-70041] POC: Optimise available ticket count based on time spent in WiredTiger Created: 28/Sep/22 Updated: 14/Oct/22 Resolved: 14/Oct/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Sulabh Mahajan | Assignee: | Sulabh Mahajan |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Sprint: | Execution Team 2022-10-03, Execution Team 2022-10-17 | ||||
| Participants: | |||||
| Description |
|
WiredTiger maintains several statistics that reflect on how long operations are taking inside the storage engine and if the application threads are waiting on page reads or write. We will experiment with adjusting the total available ticket count based on these statistics to optimize between the workload performance and the concurrency into the storage engine. The relevant statistics that I think could be helpful are the following:
|
| Comments |
| Comment by Sulabh Mahajan [ 11/Oct/22 ] | |||||||||||
|
Update:
I tried the above scheme with YCSB 60 - 100% reads with 128 and 16 threads separately. Here are how they compare from the perspective of the read load score:
The pattern is very clear when seen between the two runs. Next, I want to try adjusting the read tickets in real time if the load score is above a certain value, for instance, 50 or even 10. The load patterns for the writes are not very clear. They benefit from higher concurrency, but I also realised that since the writes mostly go to the cache and hence the histogram buckets might not be sized correctly to capture a change in latency distribution. For instance, here is the load score for the writes with 128 and 16 tickets: Note the load score is being calculated as follows, with bucket 0 (operations faster than 100us) getting a 0 weight. It is effectively a score that summarizes the distribution of the latencies inside WiredTiger and is higher if the distribution shifts to have a larger tail in the collected histograms.
|