-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Overflow Items, Reconciliation
-
None
-
Storage Engines, Storage Engines - Persistence
-
SE Persistence - 2025-10-24
-
5
-
v8.2, v8.0, v7.0
Reconciliation of pages containing overflow keys can leak overflow pages when page split fails during a bulk insert.
Error Signature
This can be detected by running verify on the table. Verify will complain that the ranges are not verified because their address ranges remain in the allocated extent list.
WT_SESSION.verify: [WT_VERB_DEFAULT][ERROR]: __verify_ckptfrag_chk, 526: checkpoint ranges never verified: 1
Steps
1. Bulk cursor inserts k/v pairs, writing any pages for any key that becomes an overflow item
2. The page size reaches the split ratio
3. Reconciliation begins to split the page
4. The page split returns EBUSY (this can happen for a multitude of reasons)
5. The overflow page is orphaned as the error path in the page split does not account for any overflow pages already written
Stack
__curbulk_insert_row()
│
▼
__wt_bulk_insert_row()
↳ Internal calls:
→ __rec_cell_build_leaf_key()
→ __wti_rec_cell_build_ovfl()
→ __rec_write()
│
▼
__wti_rec_split_crossing_bnd()
│
▼
__wti_rec_split()
│
▼
__rec_split_write()
↳ Result: returns EBUSY
Reproducer
test_ovfl01.py
follows the steps above and the wt-15739.diff
introduces a failpoint to return EBUSY during the __rec_split_write
DIscussion
The failpoint in the diff is a bit crude and doesn't narrow down the reasons for what could be causing an EBUSY at this time. I suspect the checkpoint check just below could be a reason, however, I've not yet been able to hit that in testing.
if (!last_block && __wt_btree_syncing_by_other_session(session)) {
WT_STAT_CONN_DSRC_INCR(
session, cache_eviction_blocked_multi_block_reconciliation_during_checkpoint);
return (__wt_set_return(session, EBUSY));
The bulk load path does not use the overflow page tracking logic.
/*
* Track the overflow record (unless it's a bulk load, which by definition won't ever reuse
* a record.
*/
if (!r->is_bulk_load)
WT_ERR(__wti_ovfl_reuse_add(session, page, addr, size, kv->buf.data, kv->buf.size));
The normal reconcilation path will look like:
__ovfl_reuse_wrapup_err __wti_ovfl_track_wrapup_err __rec_write_err __reconcile -> (ret = EBUSY) __wt_reconcile __evict_reconcile __wt_evict __evict_page __evict_lru_pages __evict_pass __evict_server __evict_thread_run __thread_run
Where __ovfl_reuse_wrapup_err will clean up the newly written overflow pages.
- causes
-
WT-15849 test_ovfl01 TypeError: in method 'Cursor__freecb', argument 1 of type 'struct __wt_cursor *'
-
- Open
-
- is depended on by
-
WT-15775 Add reconciliation split fail point to our testing frameworks
-
- Closed
-
- is related to
-
WT-15803 Fix ref not unlocked in error cases
-
- Closed
-
-
WT-15839 Fix flag generation in connection.h
-
- Closed
-
-
WT-15841 s_fast processes all files if no changes are detected
-
- Closed
-
-
WT-15799 Make WT atomics API consistent
-
- Closed
-
-
WT-15836 Clean up 4-bit packing code
-
- Closed
-
-
WT-15838 Add logging if checkpoint is blocked by eviction for more than 1 minute
-
- Closed
-
-
WT-15903 Unexpected standard output in test_ovfl01
-
- Closed
-
- related to
-
WT-15651 Add `PALM` test tasks to Evergreen builds
-
- Closed
-
-
WT-15775 Add reconciliation split fail point to our testing frameworks
-
- Closed
-
-
WT-15849 test_ovfl01 TypeError: in method 'Cursor__freecb', argument 1 of type 'struct __wt_cursor *'
-
- Open
-
-
WT-15491 test_cc09 checkpoint cleanup dirtied too many pages
-
- Closed
-
-
WT-15662 test_truncate29 verify return EBUSY
-
- Closed
-
-
WT-15682 Increase the read timestamp lag to avoid conflict with prepare timestamp
-
- Closed
-
-
WT-15903 Unexpected standard output in test_ovfl01
-
- Closed
-