-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Reconciliation
-
Storage Engines, Storage Engines - Persistence
-
SE Persistence - 2025-07-04
-
None
-
v8.1, v8.0, v7.0, v6.0
We just spotted a failure in __wt_page_inmem:410, which resulted in a panic while updating the history store WiredTigerHS.wt:
__wt_page_inmem:410:encountered an illegal file format or internal value: 0x0
Stack trace:
src/mongo/util/assert_util.cpp:76:56: mongo::(anonymous namespace)::callAbort() src/mongo/util/assert_util.cpp:222:14: mongo::fassertFailedWithLocation(int, char const*, unsigned int) src/mongo/util/assert_util.h:344:34: mongo::fassertWithLocation(int, bool, char const*, unsigned int) src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp:741:9: mongo::(anonymous namespace)::mdb_handle_error_with_startup_suppression(__wt_event_handler*, __wt_session*, int, char const*) (.cold) src/third_party/wiredtiger/src/support/err.c:450:15: __eventv src/third_party/wiredtiger/src/support/err.c:552:5: __wt_panic_func src/third_party/wiredtiger/src/btree/bt_page.c:410:17: __wt_page_inmem src/third_party/wiredtiger/src/btree/bt_read.c:197:5: __page_read src/third_party/wiredtiger/src/btree/bt_read.c:307:13: __wt_page_in_func src/third_party/wiredtiger/src/include/btree_inline.h:2238:11: __wt_page_swap_func.part.0 src/third_party/wiredtiger/src/btree/bt_walk.c:460:19: __wt_page_swap_func src/third_party/wiredtiger/src/btree/bt_walk.c:460:19: __tree_walk_internal src/third_party/wiredtiger/src/btree/bt_curnext.c:930:13: __wt_btcur_next src/third_party/wiredtiger/src/cursor/cur_file.c:188:5: __curfile_next src/third_party/wiredtiger/src/cursor/cur_hs.c:130:5: __curhs_file_cursor_next src/third_party/wiredtiger/src/cursor/cur_hs.c:245:5: __curhs_next src/third_party/wiredtiger/src/history/hs_rec.c:191:13: __hs_insert_record src/third_party/wiredtiger/src/history/hs_rec.c:704:17: __wt_hs_insert_updates src/third_party/wiredtiger/src/reconcile/rec_write.c:2693:13: __rec_hs_wrapup src/third_party/wiredtiger/src/reconcile/rec_write.c:2428:15: __rec_write_wrapup src/third_party/wiredtiger/src/reconcile/rec_write.c:322:5: __reconcile src/third_party/wiredtiger/src/reconcile/rec_write.c:95:11: __wt_reconcile src/third_party/wiredtiger/src/evict/evict_page.c:888:9: __evict_reconcile src/third_party/wiredtiger/src/evict/evict_page.c:272:9: __wt_evict src/third_party/wiredtiger/src/evict/evict_lru.c:2403:5: __evict_page src/third_party/wiredtiger/src/evict/evict_lru.c:1163:20: __evict_lru_pages src/third_party/wiredtiger/src/evict/evict_lru.c:340:9: __wt_evict_thread_run src/third_party/wiredtiger/src/support/thread_group.c:31:9: __thread_run
This happened on MongoDB 7.0.20. Please refer to the linked ticket for more details about this cluster.
It seems that WT read a page with an invalid dsk->type in the disk image, but the disk image itself must have passed checksum validation. It is thus possible that WT wrote the page incorrectly to begin with. Another possibility is memory corruption, e.g., if something wrote over the disk image just before writing it, or after reading it (and passing the checksum validation).
Raising this to P2 for visibility until we can get this triaged, as this could be indicative of data corruption.
- is blocked by
-
WT-14872 clang-analyzer gives misleading output
-
- Closed
-
- is related to
-
WT-14340 Make conn->flags atomic
-
- Closed
-
-
WT-14848 Fix missing WT_RET in live_restore_fs.c
-
- Closed
-
-
WT-14864 Remove duplicate TSAN warnings from metric script
-
- Closed
-
-
WT-14919 Coverity analysis defect 175312: Unused value
-
- Closed
-
-
WT-14929 Coverity analysis defect 174890: Resource leak
-
- Closed
-
-
WT-14935 Solve: SUMMARY: ThreadSanitizer: data race /home/ec2-user/work/git/wiredtiger-arm/src/support/mtx_rw.c:168:71 in __read_blocked
-
- Closed
-
-
WT-14980 Disagg table ID namespacing not correctly feature-gated
-
- Closed
-
-
WT-14865 Create a parser script for wiredtiger config string in turtle files
-
- Closed
-
-
SERVER-106431 Update version cursor config to new format
-
- Closed
-
-
WT-14826 Write the prepare timestamp and prepared id to disk with preserve prepared config
-
- Closed
-
-
WT-14828 Ensure we set the prepare id when preparing a transaction if preserve prepare config is on
-
- Closed
-
-
WT-14833 Fix TCMalloc build/propagation for some stress tests
-
- Closed
-
-
WT-14858 Forbid to prepare a transaction before the stable timestamp if preserve_prepare config is on
-
- Closed
-
-
WT-14869 Pack prepared ts and prepared id correctly to cell format and unpack them accordingly
-
- Closed
-
-
WT-14878 Assign prepared id and prepard ts on page deltas
-
- Closed
-
-
WT-14901 Enable all examples regular testing with TSAN by suppressing all the warnings
-
- Closed
-
-
WT-14951 Merge newer disagg code into develop
-
- Closed
-
-
WT-14978 Add diagnostic information to durable timestamp assertion
-
- Closed
-
-
WT-14837 Add metric to measure execution time of block_first_srch()
-
- Closed
-
-
WT-14727 update the workgen latency metrics to print the bucket count for us, ms and secs
-
- Closed
-
-
WT-14832 Add read operations to test/model
-
- Closed
-
-
WT-9931 Reader took an order of magnitude longer for when all history store records were invisible
-
- Closed
-
-
WT-14896 Failed: s-outdated-fixmes on ~ Infrequent checks [WiredTiger (develop) @ 2b0ed0cf]
-
- Closed
-
-
WT-14946 Disable incompatible tests between disagg and tiered
-
- Closed
-
-
WT-14947 Suppress perf critical warnings
-
- Closed
-
-
WT-14953 test_layered17 spinlock abort - pthread_mutex_lock: (null): Invalid argument
-
- Closed
-
-
WT-14954 Make test_truncate02 more reliable
-
- Closed
-
-
WT-14955 failed: format-failure-configs-test on ubuntu2004 [wiredtiger @ a4f10c8e]
-
- Closed
-
-
WT-14956 AssertionError in test_rollback01.py: no rollback occurred on cursor->next() for disagg
-
- Closed
-
- related to
-
WT-14653 Add logs/stats to reconciliation for tracking HS updates
-
- Closed
-
-
WT-14619 Merge layered tables into develop
-
- Closed
-
-
WT-14562 Dump all extent list blocks when we do a corrupt block dump
-
- Closed
-
-
WT-14695 Merge page deltas into develop
-
- Closed
-
-
WT-14696 Merge precise checkpoint into develop
-
- Closed
-
-
WT-14697 Merge disagg testing code into develop
-
- Closed
-
-
WT-14698 Merge remaining disagg code into develop
-
- Closed
-
-
WT-14826 Write the prepare timestamp and prepared id to disk with preserve prepared config
-
- Closed
-
-
WT-14828 Ensure we set the prepare id when preparing a transaction if preserve prepare config is on
-
- Closed
-
-
WT-14727 update the workgen latency metrics to print the bucket count for us, ms and secs
-
- Closed
-
-
WT-14832 Add read operations to test/model
-
- Closed
-
-
WT-12337 Review and fix WT_ASSERTs in packing_inline.h
-
- Closed
-
-
WT-13985 Did not run a sweep for 60 minutes in test_prepare_hs01 for CS
-
- Closed
-
-
WT-14648 Fix log subsystem returning EBUSY from conn->close()
-
- Closed
-
-
WT-14719 Update cache workloads to adapt to stat name change for cache_eviction_trigger_clean_reached
-
- Closed
-
-
WT-13038 task-timed-out: csuite-timestamp-abort-test-s3 on ubuntu2004
-
- Closed
-