-
Type:
Bug
-
Resolution: Duplicate
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Not Applicable
-
None
-
Storage Engines
-
None
-
None
In disagg, we have made the decision to not have any locally persisted data. However, we are still calling __wt_txn_recover when restarting WT. We should skip recovery on connection open and run an explicit disagg init instead.
Things to be mindful of:
- Shared metadata access
- We should not be opening shared metadata using WT_DISAGG_METADATA_URI on standby, as it opens a live btree, granting write capabilities on standbys. When configured as a standby, we should read from the metadata checkpoint instead.
- Base write gen
- This should be derived from the shared metadata checkpoint not the local metadata. Recall this should be set when: a standby is promoted to leader, the connection is opened as primary, and when picking up a new checkpoint.
- Timestamps
- conn->txn_global.stable_timestamp = shared_meta_ckpt_ts
- conn->txn_global.meta_ckpt_timestamp = shared_meta_ckpt_ts
- The above two need to be set during initial setup when opening a WT connection, but also when we pick up a checkpoint and when a standby steps up to be primary.
- Missing this in init will result in the following failure(s) in our evergreen tasks: Precise checkpoint requires a stable timestamp: Invalid argument.
- Flags
- F_SET(conn, WT_CONN_RECOVERY_COMPLETE) must be set in this init path. Other places in the code require it to be set before they run
- __wt_curstat_init asserts WT_CONN_RECOVERY_COMPLETE before allowing data-source stats
- F_SET(conn, WT_CONN_RECOVERY_COMPLETE) must be set in this init path. Other places in the code require it to be set before they run
-
-
- __statlog_log_one only walks btrees for stats logging after this flag is set
-
- Startup modes to support
- mongo starts WT as standby then steps up as primary. However, in our testing infrastructure, we can also open WT directly as primary. The implementation should account for this.
- Refactor __wt_txn_recover to remove the disagg argument and strip out disagg related branches.
- related to
-
WT-15958 Refactor disagg startup to skip recovery
-
- Open
-