Loading...

Type: Bug
Resolution: Unresolved
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: None
Component/s: Cursors
Security Level: Public (Available to anyone on the web)
Labels:
- dc
- disagg
- expedite
- layered-cursor
- lc_bulk_04_29_26
- na-mdb

Assigned Teams:

Storage Engines - Foundations
Total Hours with Assigned Team:
1,129.318
Epic Link:
Disagg Bugs
Sprint:
SE Foundations - 2026-05-22, SE Foundations - 2026-06-09
Story Points:
5

Summary

When a layered cursor on the follower performs a write that depends on a pre-existing value (remove / update / insert-existence check), it only consults the session-visible state of the stable constituent. A committed stop_ts on the stable cell can be invisible at the session's read_ts but visible to the drain. This lets the session issue a write on a key that, from the drain's timestamp-independent view, has nothing to operate on — producing unresolvable state at drain time (e.g. the __layered_assert_tombstone_has_value_on_stable_btree assert seen in ~~WT-17240~~).

Root cause

Layered cursor writes go to the ingest btree, but the "does this key have a live value?" decision is based on what the session can read at its read_ts from the stable btree. MVCC's normal write-write conflict detection is per-btree, so a newer committed stop on stable is not surfaced to an ingest write path.

Concretely, in cur_layered.c __clayered_remove_follower:

When positioned=true and current_cursor == stable_cursor: no check — the tombstone is written unconditionally.
When positioned=false: __clayered_lookup → __clayered_lookup_constituent → stable_cursor->search() returns V because read_ts < stop_ts, even though the stable cell carries a committed stop.

In both cases, the session writes a tombstone to ingest for a key whose stable cell already has a stop. On the next drain, {{ __layered_assert_tombstone_has_value_on_stable_btree}} fires with has_value=false (stable cell has HAS_STOP=true) and the ingest tombstone is not globally visible.

Reproducer

test/format in disagg.mode=switch with ops.prepare=1 and preserve_prepared=1, enabled by ~~WT-15795~~. See ~~WT-17240~~ for a concrete stack trace and the aborted-prepared tombstone observed at conn_layered_ingest.c:309:

Leader period: K=V with committed stop_ts=S on stable.
Stepdown to follower.
Follower prepared txn reads K at read_ts=R < S → stable cell's stop is invisible at R → sees V → DELETE K → tombstone in ingest.
Rollback turns it into an aborted-prepared rollback marker.
First stepup + drain → assert fires.

Insert and update have the same problem

The issue is not specific to remove. Any layered-cursor write that consults "does K exist live on stable?" to decide what to write to ingest is subject to the same staleness:

__clayered_insert / insert-existence check (e.g. no-overwrite mode): session reads at R, sees V on stable (because stable's stop is invisible at R), concludes "key exists, insert would be a duplicate" — or conversely treats a key as absent when the stable cell carries an invisible-to-R stop that the drain will honor. Either decision diverges from the drain's view.
__clayered_update / modify-follower path (see __clayered_modify_follower at cur_layered.c:2572): reads the visible-at-R value from stable as the base for the modify / update, then writes the result to ingest. Same timestamp-skew problem — the stable base value may already have a committed stop that the drain respects.

All three operations need the same guard: before writing the ingest update, verify the stable cell's full time window (WT_TIME_WINDOW_HAS_STOP) — the drain's view — not just the session-visible value.

Fix

For the remove path, the minimal guard is in *clayered_remove_follower: when the layered cursor is positioned (or lookup lands) on the stable constituent, check ((WT_CURSOR_BTREE *)clayered->stable_cursor)->upd_value->tw via WT_TIME_WINDOW_HAS_STOP and return WT_NOTFOUND if set. This mirrors the predicate used in *layered_assert_tombstone_has_value_on_stable_btree so the write path and drain assert agree on "nothing to delete."

The same guard shape should be applied to the insert existence check and the update/modify-follower paths to prevent analogous drain-time inconsistencies.

Layered cursor writes on follower should check for potential write conflicts

Summary

Root cause

Reproducer

Insert and update have the same problem

Fix

Related

Details

Description

Summary

Root cause

Reproducer

Insert and update have the same problem

Fix

Related

Attachments

Issue Links

Activity

People

Dates