-
Type:
Bug
-
Resolution: Done
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Server Security
-
ALL
-
-
Server Security 2026-07-03
-
None
-
None
-
None
-
None
-
None
-
None
-
None
The standby node crashes with a fatal invariant failure when two conditions coincide:
- A KEK rotation oplog entry has been applied, advancing the standby's in-memory keystore to KEK v2.
- The checkpoint pick-up thread simultaneously tries to install a checkpoint that was taken on the primary after the in-memory keystore was updated but before the updated keystore was flushed to WiredTiger. That checkpoint therefore contains the v1 keystore LSN.
WiredTiger detects the stale keystore reference and returns EINVAL. The invariantWTOK wrapper in wiredtiger_kv_engine.cpp:2435 turns this into an unconditional abort().
Observed Behavior
Standby process aborts. Chaos controller detects the dead standby and reports:
[MONGOD BUG] Standby unreachable after rapid_seal_burst [MONGOD BUG] validate: Standby unreachable for validate [MONGOD BUG] dbHash: Standby unreachable for dbHash [PERF] standby UNREACHABLE for 24/41 steady-state samples
Log Evidence
From the attached standby log (port 20061), in chronological order:
01:11:38.817 -- First KEK rotation completes on primary (KEK 1 → 2); oplog entry applied on standby; in-memory keystore now at LSN 7655505607113310340 01:11:43.883 id:40414 Failed to parse KEK Keystore from WiredTiger: "loadFromWT: persisted keystore timestamp 7655505345120305155 is behind last committed timestamp 7655505607113310340" 01:11:43.886 id:11722321 loadKey: Failed to load keys from WT WT: Failed to pick up disaggregated storage checkpoint for metadata_lsn=7655505607113310321: ret=22 WT: int __wti_disagg_load_crypt_key: key_provider->load_key failed WT: int __disagg_pick_up_checkpoint: __wti_disagg_load_crypt_key failed WT: Error at conn/conn_reconfig.c:449: "__wti_disagg_conn_config(session, cfg, true)" failed: EINVAL (22) 01:11:43.892 id:23083 Invariant failure: "_conn->reconfigure(_conn, getCkptMetaConfigString.c_str())" error "BadValue: 22: Invalid argument" id:23084 aborting after invariant() failure id:6384300 Got signal: 6 (Aborted)
Root Cause
The primary takes a checkpoint at time T. At that moment:
- The primary's in-memory keystore has already been updated to KEK v2 (oplog write complete).
- The WiredTiger keystore on-disk has not yet been updated (it's updated at the next checkpoint).
So the checkpoint at time T embeds the v1 keystore LSN (7655505345120305155) even though the primary's in-memory state is v2.
When the standby:
- Applies the KEK rotation oplog entry → its in-memory keystore advances to v2 (LSN 7655505607113310340).
- Picks up the checkpoint from time T via setRecoveryCheckpointMetadata → calls _conn->reconfigure() with metadata_lsn pointing to a checkpoint whose embedded keystore is at v1.
WiredTiger's __wti_disagg_load_crypt_key refuses to load a keystore whose LSN is behind the already-committed in-memory LSN, returning EINVAL. This is the correct behavior from WT's perspective — the keystore appears to have gone backwards.
The bug is that invariantWTOK in WiredTigerKVEngine::setRecoveryCheckpointMetadata (wiredtiger_kv_engine.cpp:2435) treats this recoverable error as fatal:
invariantWTOK(_conn->reconfigure(_conn, getCkptMetaConfigString.c_str()), nullptr);
Expected Behavior
The standby should not crash. Options to fix (in order of invasiveness):
- Preferred — skip stale checkpoints: When _conn->reconfigure() returns EINVAL with a stale-keystore diagnostic, log a warning and skip that checkpoint. The standby will pick up the next checkpoint, which will have been taken after the WiredTiger keystore flush and will be consistent.
- Defer checkpoint pick-up until keystore flush: On the primary, do not make a checkpoint visible for standby pick-up until the WiredTiger keystore has been flushed to reflect the current KEK.
- Tolerate backward LSN on standby during oplog replay: When applying a KEK rotation oplog entry, also advance the WiredTiger keystore LSN floor so that the stale check in __wti_disagg_load_crypt_key does not reject it.
Additional Context
- This crash cannot be triggered by pali_chaos.js because that test uses a static key file with no dynamic keystore updates.
- The rapid_seal_burst event reported in the test output is coincidental — it was in-progress when the chaos controller's liveness probe detected the already-dead standby (~30 seconds after the actual crash).
- The rotation driver confirmed 2 successful KEK rotations (kekAttempted=2, kekCompleted=2) before the crash, so the driver is correctly exercising the code path.
- depends on
-
SERVER-129742 PALI Chaos Test: coverage for dynamic KEK generation, KEK rotation, and CMK rotation
-
- In Code Review
-
- related to
-
SERVER-130020 Standby fatal abort when checkpoint pick-up encounters stale keystore LSN after KEK rotation
-
- Needs Scheduling
-