-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Critical - P2
-
Affects Version/s: None
-
Component/s: Cache and Eviction, Disagg CI-blocker
-
Security Level: Public (Available to anyone on the web)
-
Storage Engines, Storage Engines - Server Integration
-
1.016
-
WhatThePelly - 2025-09-02, SE Persistence - 2025-08-15
-
8
-
Not Needed
While running performance tests on the disagg mongod integration branch, I saw sporadic WiredTiger failures in the YCSB workloads indicating a checksum error:
{"t":{"$date":"2025-08-13T04:00:06.563+00:00"},"s":"E", "c":"WT", "id":22435, "ctx":"conn70","msg":"WiredTiger error message","attr":{"error":0,"message":{"ts_sec":1755057606,"ts_usec":563828,"thread":"7839:0xffffb6657980","session_dhandle_name":"file:collection-4b15b48b-0f49-4d6c-9174-a4a345c9615a.wt_stable","session_name":"WT_CURSOR.search","category":"WT_VERB_DEFAULT","log_id":1000000,"category_id":12,"verbose_level":"ERROR","verbose_level_id":-3,"msg":"__block_disagg_read_checksum_err:47:collection-4b15b48b-0f49-4d6c-9174-a4a345c9615a.wt_stable: read checksum error for 0B block at page 1147405, lsn 7537909630182098037: block header checksum of 2366917735 doesn't match expected checksum of 6b8e8f16"}}}
{"t":{"$date":"2025-08-13T04:00:06.563+00:00"},"s":"E", "c":"WT", "id":22435, "ctx":"conn70","msg":"WiredTiger error message","attr":{"error":0,"message":{"ts_sec":1755057606,"ts_usec":563941,"thread":"7839:0xffffb6657980","session_dhandle_name":"file:collection-4b15b48b-0f49-4d6c-9174-a4a345c9615a.wt_stable","session_name":"WT_CURSOR.search","category":"WT_VERB_DEFAULT","log_id":1000000,"category_id":12,"verbose_level":"ERROR","verbose_level_id":-3,"msg":"__wt_bm_corrupt_dump:77:{0: 1147405, 0, 0x6b8e8f16}: empty buffer, no dump available"}}}
{"t":{"$date":"2025-08-13T04:00:06.563+00:00"},"s":"E", "c":"WT", "id":22435, "ctx":"conn70","msg":"WiredTiger error message","attr":{"error":-31802,"message":{"ts_sec":1755057606,"ts_usec":563967,"thread":"7839:0xffffb6657980","session_dhandle_name":"file:collection-4b15b48b-0f49-4d6c-9174-a4a345c9615a.wt_stable","session_name":"WT_CURSOR.search","category":"WT_VERB_DEFAULT","log_id":1000000,"category_id":12,"verbose_level":"ERROR","verbose_level_id":-3,"msg":"__block_disagg_read_multiple:239:collection-4b15b48b-0f49-4d6c-9174-a4a345c9615a.wt_stable: fatal read error","error_str":"WT_ERROR: non-specific WiredTiger error","error_code":-31802}}}
{"t":{"$date":"2025-08-13T04:00:06.564+00:00"},"s":"E", "c":"WT", "id":22435, "ctx":"conn70","msg":"WiredTiger error message","attr":{"error":-31804,"message":{"ts_sec":1755057606,"ts_usec":563992,"thread":"7839:0xffffb6657980","session_dhandle_name":"file:collection-4b15b48b-0f49-4d6c-9174-a4a345c9615a.wt_stable","session_name":"WT_CURSOR.search","category":"WT_VERB_DEFAULT","log_id":1000000,"category_id":12,"verbose_level":"ERROR","verbose_level_id":-3,"msg":"__block_disagg_read_multiple:239:the process must exit and restart","error_str":"WT_PANIC: WiredTiger library panic","error_code":-31804}}}
{"t":{"$date":"2025-08-13T04:00:06.564+00:00"},"s":"F", "c":"ASSERT", "id":23089, "ctx":"conn70","msg":"Fatal assertion","attr":{"msgid":50853,"location":"src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp:645:9:int mongo::{anonymous}::mdb_handle_error_with_startup_suppression(WT_EVENT_HANDLER*, WT_SESSION*, int, const char*)"}}
{"t":{"$date":"2025-08-13T04:00:06.564+00:00"},"s":"F", "c":"ASSERT", "id":23090, "ctx":"conn70","msg":"\n\n***aborting after fassert() failure\n\n"}
This seems to happen across various YCSB workloads, and it's not deterministic. This ticket is to investigate and fix the issue. Here's an example of a failing YCSB workload with the checksum error.
- related to
-
WT-15266 Dump all pages from the pali response in the results array on checksum failure
-
- Open
-
-
SERVER-110431 Invariant on empty phylog pages in the write path
-
- Closed
-