[SERVER-80124] Measure performance impact of background compaction Created: 16/Aug/23 Updated: 06/Feb/24 |
|
| Status: | In Progress |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Gregory Wlodarek | Assignee: | Etienne Petrel |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | 2023-12-12 - Heisenbug, 2024-01-09 - I Grew Tired, StorEng - 2024-01-23, 2024-02-06 tapioooooooooooooca, 2024-02-20_A_near-death_puffin | ||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||
| Story Points: | 5 | ||||||||||||||||||||||||||||||||||||||||||||
| Comments |
| Comment by Sean Watt [ 17/Jan/24 ] | |||||||||||||||||||||
|
Here are a few more interesting stats from llt_mixed that help give context about what compact is doing. | |||||||||||||||||||||
| Comment by Sean Watt [ 16/Jan/24 ] | |||||||||||||||||||||
|
In comparison to the ycsb_60GB.long etienne.petrel@mongodb.com mentions. I would expect llt_mixed to spend less time on compaction since the dataset is much smaller. I believe the test operations are somehow causing compact to remain on a single table for an extended period, probably unnecessarily. | |||||||||||||||||||||
| Comment by Etienne Petrel [ 13/Dec/23 ] | |||||||||||||||||||||
|
After some discussion with the team, we decided to skip that initial checkpoint in the case of background compaction. See | |||||||||||||||||||||
| Comment by Etienne Petrel [ 12/Dec/23 ] | |||||||||||||||||||||
|
I found another test where the performance impact is noticeable while compaction has no work to do/. In the logs, each compact call results in there is no useful work to do - skipping compaction either because the file size is less than 1MB or the number of available bytes does not meet the default threshold 20MB. So why do we see perf impacts? FTDC: While the server status metrics look quite the same between the baseline and the run with background compaction enabled, we can notice the following:
Checkpoints seem to be playing a role in the perf impact. Something I have noticed is that we always perform a checkpoint before the two following checks:
Both checks can be impacted by a checkpoint, however, the first one is quite cheap and it could be worth checking this without doing a checkpoint. Even though checkpoint seems more crucial for the second check, I wonder if we could expect the first checkpoint to be done by the application instead of hiding it at the start of compaction. I created a patch build where compaction does not generate the first checkpoint, there is no perf regression. I will check with the team if there is something we can do (EDIT: the discussion led to | |||||||||||||||||||||
| Comment by Etienne Petrel [ 08/Dec/23 ] | |||||||||||||||||||||
|
Those are the stats for ycsb_60GB_long base (background compaction disabled) vs bg (background compaction enabled). We can see that eviction is greatly impacted by the background compaction server. Two stats to highlight are:
Since application threads are highly solicited to perform eviction, they struggle to maintain the throughput during the test duration, hence the performance regression. | |||||||||||||||||||||
| Comment by Etienne Petrel [ 08/Dec/23 ] | |||||||||||||||||||||
In this patch build, I could observe two things:
Compaction occurred for ~11 min which is the majority of the test duration. It makes sense that compaction had time to conflict with checkpoints as MongoDB tries to take one every minute. The test focuses on this collection while compaction si trying to work on it which creates contention and leads to poor per for the ycsb_50read50update scenario: Z-Score: -80.5, Percent Diff (Region): -58.2%. | |||||||||||||||||||||
| Comment by Sean Watt [ 04/Dec/23 ] | |||||||||||||||||||||
|
This configuration (background_compact_10gb.txt Some example logs: background_compact_logs.txt
| |||||||||||||||||||||
| Comment by Etienne Petrel [ 04/Dec/23 ] | |||||||||||||||||||||
|
Update:
| |||||||||||||||||||||
| Comment by Etienne Petrel [ 29/Nov/23 ] | |||||||||||||||||||||
|
I have noticed a mistake I made here, I have set up the free_space_target to 1MB which is way too low and will likely trigger compaction as soon as we can recover 1MB. I will re-run a patch with the default value (20MB), it may have no impact in the end but is worth re-trying. | |||||||||||||||||||||
| Comment by Etienne Petrel [ 24/Nov/23 ] | |||||||||||||||||||||
|
Perf patch builds: They unfortunately don't show good results at first. After investigating with sean.watt@mongodb.com, we have observed an interesting pattern. Here is an example using the ycsb_60GB.long test:
Next steps:
|