In order to minimize the I/O required for checkpoints, we need to know how dirty the cache is.
I think we could measure the dirtiness of the cache by bytes or by pages; bytes if we believe reconciling a page with a lot of changes is more expensive than reconciling a page with only a few changes, and pages if we believe reconciling a page is pretty much the same cost regardless of the number of changes to it.
I'm inclined to go with pages: it seems the real cost is the I/O which is better described by a dirty-page count. A dirty byte count isn't interesting unless the page has so many appends or inserts it splits into multiple pages, that is, if the dirty byte count is really large, it reflects a probability for multiple I/Os during a single page reconciliation.
What if we:
1. have the serialization function increment a global counter when a page is first marked dirty (if the page's write generation and disk generations are the same), and the eviction server increment a corresponding counter when a page is reconciled
2. add an eviction_dirty configuration string which is an absolute value (1-N), the number of pages that can be dirty before eviction wakes up and starts pushing dirty pages out of the cache
3. change eviction to respect that new configuration value, and distinguish if we're pushing only dirty pages out
Questions:
- In Berkeley DB, the configuration is based on the percentage of the cache that needs to be clean, but with a variable-sized set of pages, I think an absolute number of pages to push at checkpoint is a better measurement.
- It's tricky to know when a dirty page is clean without going through the serialization thread. Since reconciliation is done by a random thread, and can be dirtied while it's being reconciled, there are obvious races. I think it works if we increment the clean page counter whenever we manage to set the disk generation:
/* * If the write succeeded, no updates were skipped and the disk * generation has not changed in the meantime, update it to the write * generation when reconciliation started. If we managed to clean a * page, increment the count of clean pages. */ if (!r->upd_skipped) if (WT_ATOMIC_CAS(mod->disk_gen, r->orig_disk_gen, r->orig_write_gen)) WT_ATOMIC_ADD(clean_page_counter, 1);
Thoughts?