Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Security Level: Public (Available to anyone on the web)
Labels:
- Disag_Storage
- lc_bulk_04_29_26

Assigned Teams:

Storage Engines - Transactions
Total Hours with Assigned Team:
5,359.793
Sprint:
None
Story Points:
None

Note: This may not be specific to disaggregated storage, but I noticed it when investigating disagg YCSB benchmarks, so I'm opening an SLS ticket.

I was taking a look at some FTDC data from a throttled YCSB 100 update run, and the metrics confuse me:

The gap between peaks on the checkpoint progress state metric is about 9 seconds. The metrics show that checkpoints are completing in about 150 milliseconds, which is confirmed by the shape of the checkpoint progress state graph. There are two things that surprise me:

1) The checkpoint number of pages caused to be reconciled rate remains at about 140/second even when there isn't a checkpoint running.
2) The checkpoint number of history store pages caused to be reconciled rate statistic is very similar to the non-history store statistic. My reading of that was that one or two non-history store pages are triggering reconciliation of ~130 history store pages.

That led me to inspect the code in bt_sync.c, which looks like:

341             WT_STAT_CONN_INCR(session, checkpoint_pages_reconciled);
342             WT_STATP_DSRC_INCR(session, btree->dhandle->stats, btree_checkpoint_pages_reconciled);
343             if (FLD_ISSET(rec_flags, WT_REC_HS))
344                 WT_STAT_CONN_INCR(session, checkpoint_hs_pages_reconciled);
345
346             WT_ERR(__wt_reconcile(session, walk, NULL, rec_flags));

The WT_REC_HS is a flag that allows reconciliation to write content back to the history store, not a flag that tracks how often pages are written back to the history store.

Also: It seems surprising that the statistics are incremented prior to the reconciliation call. It's probably OK, since a failed reconciliation in checkpoint should generally result in a fatal error for the system.