Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Checkpoints
Labels:
None

Assigned Teams:

Storage Engines, Storage Engines - Foundations
Sprint:
None
Story Points:
13

Estimated Weeks:
0

Currently the checkpoint process is always performed by one thread only which iterates over the list of dirty btrees and processes them one after another. That's a good point for a multithreaded execution since this process is almost completely independent and so can be done simultaneously.

This activity was started as a part of Skunkworks (two years in a row), so we already have POC PR that contains the following approach:

Main checkpoint thread starts checkpoint
It iterates over all dirty dhandles and puts them to the mutex-protected queue
After this it signals the worker threads that there are some btrees to reconcile
Each worker thread that was lucky enough to get a btree to process, processes it and then reports an information to update metadata back to the main thread
Main thread updates metadata accordingly (worth noticing that metadata update should be done as one txn, so all the updates are performed by the main thread)

Since the PR is in an early POC stage, it is far from production quality. I’d like to outline the next steps as I currently see them:

We have not yet reached complete CI greenness:

- PR passess all the checkpoint tests, but fails in some places in CI (mostly in tiered storage related tests)

- For some tests it takes significantly longer time to pass than in develop:
  - test_bug010.test_bug010.test_checkpoint_dirty (creates 140 tables)
  - many-dhandle-stress
Possible implementation/design improvements:
- Since we always know how many dhandles we have to checkpoint, we can preallocate memory for the shared queue to make it lock free. However this approach can have it's own performance drawbacks.
- WT already creates threads for eviction and with this patch there are more WT created threads for checkpointing. So it is worth considering how all these threads affect system utilization and whether it causes system (under)oversubscription in some cases. The approach can be creation of a shared thread pool for both eviction/checkpoint.
- If a checkpoint doesn't have enough job to do, doing it simultaneously can become significantly slower than it's sequential version. So it would be useful to support both options and some mechanism to distinguish whether a certain checkpoint should be done by single or multiple threads. Off the top of my head, this heuristic could include the number of dirty Btrees plus the average number of dirty pages per Btree.

- All other comments in the PR that are started by // to fire CI warnings and marked by ??? to discuss them later
Even without all these improvements PR shows significant improvement on both x86 and ARM for "checkpoint-stress - Update", but it also causes degradations in some other benchmarks. I didn’t have enough time to determine whether all these degradations are consistent or not, so it’s worth reevaluating the performance.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

parallel-checkpoint-v1-arm.pdf
May 16 2025 04:47:31 AM UTC
15.60 MB
Ivan Kochin
parallel-checkpoint-v1-x86.pdf
May 16 2025 04:47:32 AM UTC
15.67 MB
Ivan Kochin

Assignee:: Ivan Kochin
Reporter:: Ivan Kochin
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: May 16 2025 04:44:45 AM UTC
Updated:: May 16 2025 05:09:44 AM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates