[SERVER-80124] Measure performance impact of background compaction Created: 16/Aug/23  Updated: 06/Feb/24

Status: In Progress
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Gregory Wlodarek Assignee: Etienne Petrel
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screenshot 2024-01-16 at 4.45.46 pm.png     PNG File Screenshot 2024-01-17 at 5.05.58 pm.png     PNG File Screenshot 2024-01-17 at 5.08.16 pm.png     PNG File background compaction FTDC perf.png     File background_compact.diff     Text File background_compact_10gb.txt     Text File background_compact_logs.txt     PNG File industry_benchmarks_ycsb_95read5update_perf.png     PNG File industry_benchmarks_ycsb_95read5update_t2.png     PNG File linbench2-hs-compact.png     PNG File linkbench2_disk.png     PNG File linkbench2_latencies.png     PNG File linkbench2_latencies_zoom_out.png     PNG File linkbench2_wiredtiger.png     PNG File linkbench2_wiredtiger_2.png     PNG File stats_second_performance.png     PNG File stats_ycsb_60GB_long.png     PNG File ycsb_60GB.long.2023-09_disk.png     PNG File ycsb_60GB.long.2023-09_latencies.png     PNG File ycsb_60GB.long.2023-09_wiredtiger.png    
Issue Links:
Depends
depends on SERVER-80121 Create the background compaction API Closed
Related
related to WT-12314 Compaction stays stuck in the estimat... Closed
related to WT-12296 Investigate whether compaction should... Backlog
related to WT-12297 Stop compacting if not enough space h... Backlog
related to WT-12312 Create a test where compaction occurs... Closed
related to WT-12319 Re-estimate work during compaction wh... Open
related to WT-12328 Stop compacting if a file has grown i... Closed
is related to WT-12078 Fix compact to handle pages in WT_REF... Closed
is related to WT-12132 Make background compaction skip the i... Closed
Sprint: 2023-12-12 - Heisenbug, 2024-01-09 - I Grew Tired, StorEng - 2024-01-23, 2024-02-06 tapioooooooooooooca, 2024-02-20_A_near-death_puffin
Participants:
Story Points: 5

 Comments   
Comment by Sean Watt [ 17/Jan/24 ]

Here are a few more interesting stats from llt_mixed that help give context about what compact is doing.
1. session table compact passes counts the number of passes compact does over a given table. Compact will do another pass if it has
a. Moved some blocks from the last 90%
b. Encountered a deleted page.
c. Encountered a modified in memory page.
In this case session table compact passes reaches ~25 before compact moves on to another table.
2. checkpoint progress state hovers between 6 (state sync file) and 8 (state bm sync) indicating that there are constant checkpoints happening and could be a bottleneck here.

Zooming in on the final part of the workload it's evident that compact doesn't recover any meaningful space until after the workload finishes performing operations and compact is able to move on from the table it was stuck on. Once background compact moves on it quickly passes over all of the other tables and reclaims a significant amount of space. In this case ~15GB.

Comment by Sean Watt [ 16/Jan/24 ]

In comparison to the ycsb_60GB.long etienne.petrel@mongodb.com mentions. I would expect llt_mixed to spend less time on compaction since the dataset is much smaller.

I believe the test operations are somehow causing compact to remain on a single table for an extended period, probably unnecessarily.

Comment by Etienne Petrel [ 13/Dec/23 ]

After some discussion with the team, we decided to skip that initial checkpoint in the case of background compaction. See WT-12132.

Comment by Etienne Petrel [ 12/Dec/23 ]

I found another test where the performance impact is noticeable while compaction has no work to do/.
Patch build: link
Base: sys_perf_3ab2f6d2762b0065eae8a857e90d0a2e135f31e4
Test: secondary_performance
Perf dashboard: link

In the logs, each compact call results in there is no useful work to do - skipping compaction either because the file size is less than 1MB or the number of available bytes does not meet the default threshold 20MB. So why do we see perf impacts?

FTDC:

While the server status metrics look quite the same between the baseline and the run with background compaction enabled, we can notice the following:

  • 2x times more writes are queued, those are waiting for a ticket or on a lock.
  • There are 55 times checkpoints started by compaction. It is not visible on this screenshot, but the stat that indicates the number of checkpoints triggered by the API is 11 on the baseline.
  • This can be seen as well with the number of files synced by checkpoint, almost 2x more with bg compaction enabled

Checkpoints seem to be playing a role in the perf impact. Something I have noticed is that we always perform a checkpoint before the two following checks:

  • The file size is greater than 1MB
  • The number of bytes available is greater than the specified threshold

Both checks can be impacted by a checkpoint, however, the first one is quite cheap and it could be worth checking this without doing a checkpoint. Even though checkpoint seems more crucial for the second check, I wonder if we could expect the first checkpoint to be done by the application instead of hiding it at the start of compaction.

I created a patch build where compaction does not generate the first checkpoint, there is no perf regression. I will check with the team if there is something we can do (EDIT: the discussion led to WT-12132).

Comment by Etienne Petrel [ 08/Dec/23 ]

Those are the stats for ycsb_60GB_long base (background compaction disabled) vs bg (background compaction enabled). We can see that eviction is greatly impacted by the background compaction server. Two stats to highlight are:

  • cache modified pages evicted by application threads (~x3 with bg compaction)
  • cache pages evicted by application threads (~x2 with bg compaction)

Since application threads are highly solicited to perform eviction, they struggle to maintain the throughput during the test duration, hence the performance regression.

Comment by Etienne Petrel [ 08/Dec/23 ]

We are still not able to explain the EBUSY due to a conflicting checkpoint. Why does this specific table take more time than the others that it ends up conflicting with a checkpoint

In this patch build, I could observe two things:

  • For ycsb_60GB, compaction had nothing to compact and there is no perf regression.
  • For ycsb_60GB.long, compaction found work to do on the collection collection-41-16287482622083310647.wt and this is the timeline:

# Start of the test
2023-12-07T08:16:11.794+00:00
2023-12-07T08:16:13.639+00:00 Compacting file:collection-41-16287482622083310647.wt
...
2023-12-07T08:27:03.353+00:00 Compaction conflicts with checkpoint
# End of the test
2023-12-07T08:28:37.544+00:00

Compaction occurred for ~11 min which is the majority of the test duration. It makes sense that compaction had time to conflict with checkpoints as MongoDB tries to take one every minute. The test focuses on this collection while compaction si trying to work on it which creates contention and leads to poor per for the ycsb_50read50update scenario: Z-Score: -80.5, Percent Diff (Region): -58.2%.

Comment by Sean Watt [ 04/Dec/23 ]

This  configuration (background_compact_10gb.txt) and change (background_compact.diff) to the background compact cppsuite test readily produces the scenarios of compact completing 100 iterations over a file but failing to recover any space. 

Some example logs: background_compact_logs.txt

[1701732616:919716][2044712:0x7fca6c7f8640], file:collection_0.wt, WT_SESSION.compact: [WT_VERB_COMPACT][DEBUG_1]: ============ ending compaction pass
[1701732616:919753][2044712:0x7fca6c7f8640], file:collection_0.wt, WT_SESSION.compact: [WT_VERB_COMPACT][DEBUG_1]: pages reviewed: 240800
[1701732616:919769][2044712:0x7fca6c7f8640], file:collection_0.wt, WT_SESSION.compact: [WT_VERB_COMPACT][DEBUG_1]: pages skipped: 240800
[1701732616:919781][2044712:0x7fca6c7f8640], file:collection_0.wt, WT_SESSION.compact: [WT_VERB_COMPACT][DEBUG_1]: pages rewritten: 0 (297097 expected)
[1701732616:919794][2044712:0x7fca6c7f8640], file:collection_0.wt, WT_SESSION.compact: [WT_VERB_COMPACT][DEBUG_1]: file size 11559MB (12121391104B) with 31% space available 3614MB (3789975552B)

Comment by Etienne Petrel [ 04/Dec/23 ]

Update:

  • We are still not able to explain the EBUSY due to a conflicting checkpoint. Why does this specific table take more time than the others that it ends up conflicting with a checkpoint
  • We found out that compaction can get stuck or make minimal progress when it finds a page with the WT_REF_DELETED state, waits for a checkpoint to operate on that one, checkpoint does not, compaction finds the same page with WT_REF_DELETED, hopes for a checkpoint to do something, etc. A theory is that we should not expect the following checkpoint to process those pages as it is not guaranteed and force compaction to move on.
    • WT-12078 has been created to fix this scenario.
Comment by Etienne Petrel [ 29/Nov/23 ]

I have noticed a mistake I made here, I have set up the free_space_target to 1MB which is way too low and will likely trigger compaction as soon as we can recover 1MB. I will re-run a patch with the default value (20MB), it may have no impact in the end but is worth re-trying.

Comment by Etienne Petrel [ 24/Nov/23 ]

Perf patch builds:

They unfortunately don't show good results at first. After investigating with sean.watt@mongodb.com, we have observed an interesting pattern. Here is an example using the ycsb_60GB.long test:

  • The mongod.0 logs show the following:

    ..."session_dhandle_name":"file:_mdb_catalog.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"_mdb_catalog.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-0-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-0-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-10-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-10-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-12-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-12-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-14-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-14-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-15-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-15-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-17-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-17-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-19-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-19-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-2-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-2-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-22-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-22-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-24-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-24-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-26-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-26-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-27-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-27-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-28-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-28-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-32-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-32-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-35-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-35-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-38-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-38-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ..."session_dhandle_name":"file:collection-4-15430505529018804863.wt","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"collection-4-15430505529018804863.wt: there is no useful work to do - skipping compaction"}}}
    ...
    # 1 minute later
    {"t":{"$date":"2023-11-23T07:06:28.283+00:00"},"s":"I",  "c":"WTCMPCT",  "id":22430,   "svc":"u", "ctx":"thread12","msg":"WiredTiger message","attr":{"message":{"ts_sec":1700723188,"ts_usec":283501,"thread":"10970:0xffff805e2d90","session_name":"WT_SESSION.compact","category":"WT_VERB_COMPACT","category_id":9,"verbose_level":"INFO","verbose_level_id":0,"msg":"Data handle compaction failed with EBUSY but the cache is not stuck. Will give it another go."}}}
    

  • This most interesting bit is the last line that shows up 40 minutes later. It does not seem right as it means background compaction skipped a bunch of files, then blocked on a specific file to finally get EBUSY.
  • To prove background compaction is running the whole time, we can see on the FTDC that the table compact running is always on which means compaction is active:

Next steps:

  • Add more logs to understand what compaction is trying to do
  • Improve the logs to state which dhandle is returning EBUSY 40 min later
Generated at Thu Feb 08 06:42:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.