Loading...

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Checkpoints
Labels:
- CI-blocker

Assigned Teams:

Storage Engines - Transactions
Total Hours with Assigned Team:
546.276
Sprint:
SE Transactions - 2026-07-03
Story Points:
1
Evergreen Project:
- wiredtiger
- wiredtiger-disagg
Linked BFG List:
https://buildbaron.corp.mongodb.com/ui/#/bf/WT-17968
Count of Linked BFGs (Last 30 days):
55

Problem

__disagg_pick_up_checkpoint is supposed to refuse (panic) adopting a checkpoint whose oldest_timestamp exceeds the connection's current pinned timestamp - protecting any actively-pinned reader from having its data pruned out from under it. The check, as written, can never fire.

Root cause

In src/conn/conn_layered_checkpoint_pick_up.c, __disagg_pick_up_checkpoint:

WT_DISAGG_METADATA metadata is declared and WT_CLEAR(metadata)'d near the top of the function.
The pinned-timestamp panic check runs immediately after, reading metadata.oldest_timestamp - which is still zero from the clear.
Only afterward does _wti_disagg_fetch_shared_meta + _wt_disagg_parse_meta actually populate metadata with the real checkpoint fields, including the real oldest_timestamp.

So the check is effectively 0 > pinned_timestamp, which is essentially always false (pinned_timestamp, when set, is >= 1; when unset the != WT_TS_NONE guard already skips the check). The safety net is dead code - a checkpoint whose real oldest_timestamp exceeds an actively-pinned reader's timestamp can be silently adopted.

Fix

Move the pinned-timestamp check (the _wt_txn_update_pinned_timestamp refresh + the comparison + panic) to after the _wt_disagg_parse_meta call, so it reads the real, parsed metadata.oldest_timestamp instead of the pre-parse zeroed placeholder. No other logic changes.

Verification

Confirmed with a scripted single-scenario repro that forces a checkpoint whose real oldest_timestamp exceeds an active reader's pinned read timestamp:

Before the fix: adoption silently succeeds; the stale pin later surfaces as an unrelated, confusing crash in __clayered_reopen_stable (see the companion ticket for that assertion) when the reader's cursor tries to advance.
After the fix: __disagg_pick_up_checkpoint correctly panics at adoption time with "Disaggregated storage checkpoint oldest_timestamp ... is greater than the current pinned timestamp ...".

Open question for the fix's severity/design

Once this check correctly fires, it does so via a hard __wt_panic (full process abort + restart), not a graceful refusal or a per-reader failure. A reader that simply falls behind the checkpoint cadence (e.g. a slow scan under sustained write load) will now crash the whole node rather than just failing or blocking that one reader. Worth a design discussion on whether panic is the intended response here, or whether checkpoint adoption should instead be deferred/refused non-fatally when this condition is detected.

Relationship to the __clayered_reopen_stable assertion

This is a separate, non-overlapping bug from the "upgrading a positioned stable cursor" assertion in __clayered_reopen_stable (filed separately). That assertion can fire even when this check is fixed and every adopted checkpoint fully respects the pinned-timestamp invariant - confirmed empirically with a repro that keeps a wide oldest lag so this panic never fires, yet the other assertion still fires reliably. The two bugs operate at different granularities: this one is a connection-wide timestamp invariant; the other is a per-cursor/per-key hazard.

is duplicated by

WT-18003 failed: format-failure-configs-test on ubuntu2004-arm64 [wiredtiger @ dd48812a]

Closed

is related to

WT-17969 Follower layered cursor aborts in __clayered_reopen_stable when a read-timestamped reader's parked stable key vanishes from a legitimately-adopted newer checkpoint

Closed

WT-17457 test/format (disagg.mode=multi) __txn_assert_after_reads assert

Closed

related to

WT-17969 Follower layered cursor aborts in __clayered_reopen_stable when a read-timestamped reader's parked stable key vanishes from a legitimately-adopted newer checkpoint

Closed

WT-18008 [Verify] race-condition-stress-asan-test-3 timeout: possible stall in new HS verify logic (__verify_key_hs) under ASAN

Open

WT-17983 failed: format-failure-configs-test on amazon2023-arm64 [wiredtiger @ dd48812a]

Closed

WT-18003 failed: format-failure-configs-test on ubuntu2004-arm64 [wiredtiger @ dd48812a]

Closed

WT-18040 test_layered_stepup12 does not abort as expected

Closed

(3 related to)

Details

Description

Problem

Root cause

Fix

Verification

Open question for the fix's severity/design

Relationship to the __clayered_reopen_stable assertion

Attachments

Issue Links

Activity

People

Dates