-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Critical - P2
-
None
-
Affects Version/s: None
-
Component/s: Layered Tables, Truncate
-
None
-
Storage Engines - Foundations
-
299.11
-
None
-
None
__layered_clear_ingest_table writes transactional no-timestamp tombstones that eviction cannot safely reconcile on the ingest btree, causing __rec_fill_tw_from_upd_select assertion failure (vpack=NULL) in AF-17117 and WT-17354.
Problem
During follower-to-leader step-up, __layered_clear_ingest_table is called after __layered_copy_ingest_table has drained ingest content into the stable table. The clear function uses a transactional truncate() to wipe the ingest btree:
/* * __layered_clear_ingest_table -- * After ingest content has been drained to the stable table, clear out the ingest table. */ static int __layered_clear_ingest_table(WT_SESSION_IMPL *session, const char *uri) { WT_ASSERT(session, WT_URI_IS_INGEST(uri)); /* * Truncate needs a running txn. We should probably do something more like the history store and * make this non-transactional -- this happens during step-up, so we know there are no other * transactions running, so it's safe. */ WT_RET(__wt_txn_begin(session, NULL)); /* * No other transactions are running, we're only doing this truncate, and it should become * immediately visible. So this transaction doesn't have to care about timestamps. */ F_SET(session->txn, WT_TXN_TS_NOT_SET); WT_RET(session->iface.truncate(&session->iface, uri, NULL, NULL, NULL)); WT_RET(__wt_txn_commit(session, NULL)); return (0); }
This produces a WT_UPDATE_TOMBSTONE for every key in the ingest btree with a transaction ID but no timestamp.
This is confirmed by the verbose log output from the crash:
int __rec_fill_tw_from_upd_select(WT_SESSION_IMPL *, WT_PAGE *, WT_CELL_UNPACK_KV *, WTI_UPDATE_SELECT *, _Bool, WTI_RECONCILE *, WT_UPDATE *):1486:WiredTiger assertion failed: '(vpack != ((void*)0) && vpack->type != (4 << 4))'. No on-disk value is found update[0]: type=TOMBSTONE txnid=1549215 start_ts=(0, 0) durable_ts=(0, 0) prepare_ts=(0, 0) prepared_id=0 prepare_state=0 flags=0x0
Writes to disk through I/O operations (such as eviction or checkpointing) can still occur in parallel during step-up. Eviction threads are therefore not blocked from touching the ingest btree while the clear is running, before the tombstone is globally visible. __rec_fill_tw_from_upd_select() is called with no on-disk backing cell vpack == NULL, causing a crash.
Here is an evergreen patch with verbose logging of the issue. Full logs are also attached to the ticket.
Proposed Fix
Bypass the transaction entirely so tombstones have no txnid and are immediately globally visible to all eviction threads. Writing tombstones with txnid = WT_TXN_NONE makes __wt_txn_upd_visible_all() return true unconditionally, regardless of any concurrent reader's oldest ID.
Alternatively, instead of truncating in place, drop the btree and recreate it empty after draining, which eliminates eviction issues and bypasses reconciliation.
- is related to
-
WT-17783 Missing prior (globally visible) updates in reconciliation chain during stepup truncate
-
- Open
-