-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Overflow Items, Reconciliation
-
None
-
Storage Engines, Storage Engines - Transactions
-
SE Transactions - 2025-11-07
-
5
-
v8.2, v8.0, v7.0
Reconciliation of pages containing overflow keys can leak overflow pages when page split fails during a bulk insert.
Error Signature
This can be detected by running verify on the table. Verify will complain that the ranges are not verified because their address ranges remain in the allocated extent list.
WT_SESSION.verify: [WT_VERB_DEFAULT][ERROR]: __verify_ckptfrag_chk, 526: checkpoint ranges never verified: 1
Steps
1. Bulk cursor inserts k/v pairs, writing any pages for any key that becomes an overflow item
2. The page size reaches the split ratio
3. Reconciliation begins to split the page
4. The page split returns EBUSY (this can happen for a multitude of reasons)
5. The overflow page is orphaned as the error path in the page split does not account for any overflow pages already written
Stack
__curbulk_insert_row() │ ▼ __wt_bulk_insert_row() ↳ Internal calls: → __rec_cell_build_leaf_key() → __wti_rec_cell_build_ovfl() → __rec_write() │ ▼ __wti_rec_split_crossing_bnd() │ ▼ __wti_rec_split() │ ▼ __rec_split_write() ↳ Result: returns EBUSY
Reproducer
test_ovfl01.py follows the steps above and the wt-15739.diff
introduces a failpoint to return EBUSY during the __rec_split_write
DIscussion
The failpoint in the diff is a bit crude and doesn't narrow down the reasons for what could be causing an EBUSY at this time. I suspect the checkpoint check just below could be a reason, however, I've not yet been able to hit that in testing.
if (!last_block && __wt_btree_syncing_by_other_session(session)) { WT_STAT_CONN_DSRC_INCR( session, cache_eviction_blocked_multi_block_reconciliation_during_checkpoint); return (__wt_set_return(session, EBUSY));
The bulk load path does not use the overflow page tracking logic.
/* * Track the overflow record (unless it's a bulk load, which by definition won't ever reuse * a record. */ if (!r->is_bulk_load) WT_ERR(__wti_ovfl_reuse_add(session, page, addr, size, kv->buf.data, kv->buf.size));
The normal reconcilation path will look like:
__ovfl_reuse_wrapup_err __wti_ovfl_track_wrapup_err __rec_write_err __reconcile -> (ret = EBUSY) __wt_reconcile __evict_reconcile __wt_evict __evict_page __evict_lru_pages __evict_pass __evict_server __evict_thread_run __thread_run
Where __ovfl_reuse_wrapup_err will clean up the newly written overflow pages.