hello_with_standby.js Segmentation fault __wt_evict_file during shutdown on standby

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Layered Tables
    • None
    • Storage Engines, Storage Engines - Foundations
    • None
    • None

      We observed a crash recently in a mongod process running as a standby node in disaggregated storage mode. The conditions around the crash resemble SLS-2861, which added some diagnostics to debug future occurrences such as this.

      Task: https://spruce.mongodb.com/task/mongod_v8.0_amazon_linux2_arm64_dynamic_compile_display_disagg_storage_296add02472f30f38c088f407ec866c36d9e2e3f_25_06_11_19_36_55/execution-tasks?execution=0&sorts=STATUS%3AASC

      Logs: https://parsley.mongodb.com/test/mongod_v8.0_amazon_linux2_arm64_dynamic_compile_disagg_storage_2_linux_enterprise_296add02472f30f38c088f407ec866c36d9e2e3f_25_06_11_19_36_55/0/0e859443c577b7d5fd73cf164c518f4f?shareLine=1737

      {"ts_sec":1749672365,"ts_usec":845232,"thread":"8376:0xffff81150350","session_dhandle_name":"file:WiredTigerShared.wt_stable","session_name":"WT_CONNECTION.reconfigure","category":"WT_VERB_RTS","category_id":37,"verbose_level":"WARNING","verbose_level_id":-2,"msg":"skipped shutdown RTS due to disagg"}}}
      [js_test:hello_with_standby] d21547| {"t":{"$date":"2025-06-11T20:06:05.845+00:00"},"s":"I",  "c":"WTRECOV",  "id":22430,   "ctx":"SignalHandler","msg":"WiredTiger message","attr":{"message":{"ts_sec":1749672365,"ts_usec":845412,"thread":"8376:0xffff81150350","session_dhandle_name":"file:WiredTigerShared.wt_stable","session_name":"WT_CONNECTION.reconfigure","category":"WT_VERB_RECOVERY_PROGRESS","category_id":36,"verbose_level":"DEBUG_1","verbose_level_id":1,"msg":"shutdown was completed successfully and took 489ms, including 0ms for the rollback to stable, and 0ms for the checkpoint."}}}
      [js_test:hello_with_standby] d21547| {"t":{"$date":"2025-06-11T20:06:05.845+00:00"},"s":"F",  "c":"CONTROL",  "id":6384300, "ctx":"SignalHandler","msg":"Writing fatal message","attr":{"message":"Invalid access at address: 0xd8\n"}}
      [js_test:hello_with_standby] d21547| {"t":{"$date":"2025-06-11T20:06:05.845+00:00"},"s":"F",  "c":"CONTROL",  "id":6384300, "ctx":"SignalHandler","msg":"Writing fatal message","attr":{"message":"Dumping siginfo (si_code=1): 0b 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 d8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00\n"}}
      [js_test:hello_with_standby] d21547| {"t":{"$date":"2025-06-11T20:06:05.845+00:00"},"s":"F",  "c":"CONTROL",  "id":6384300, "ctx":"SignalHandler","msg":"Writing fatal message","attr":{"message":"Got signal: 11 (Segmentation fault).\n"}}
      [js_test:hello_with_standby] d21547| {"t":{"$date":"2025-06-11T20:06:05.920+00:00"},"s":"E",  "c":"CONTROL",  "id":31430,   "ctx":"SignalHandler","msg":"Error collecting stack trace","attr":{"error":"unw_get_proc_name(FFFF9FA78850): unspecified (general) error\n"}}
      [js_test:hello_with_standby] d21547| {"t":{"$date":"2025-06-11T20:06:05.920+00:00"},"s":"I",  "c":"CONTROL",  "id":31380,   "ctx":"SignalHandler","msg":"BACKTRACE","attr":{"bt":{"backtrace":[{"a":"FFFF9E4C7664","b":"FFFF9E25B000","o":"26C664","s":"_ZN5mongo15printStackTraceEv","C":"mongo::printStackTrace()","s+":"44"},{"a":"FFFF9E4C2AC8","b":"FFFF9E25B000","o":"267AC8","s":"abruptQuitWithAddrSignal","s+":"148"},{"a":"FFFF9FA78850","b":"FFFF9FA78000","o":"850","s":"__kernel_rt_sigreturn","s+":"0"},{"a":"FFFF832D72A4","b":"FFFF83110000","o":"1C72A4","s":"__wt_evict_file","s+":"24"},{"a":"FFFF8326C184","b":"FFFF83110000","o":"15C184","s":"__wt_conn_dhandle_close","s+":"2D4"},{"a":"FFFF8326DB74","b":"FFFF83110000","o":"15DB74","s":"__wti_conn_dhandle_discard_single","s+":"150"},{"a":"FFFF8326DD00","b":"FFFF83110000","o":"15DD00","s":"__wti_conn_dhandle_discard","s+":"B0"},{"a":"FFFF83276D54","b":"FFFF83110000","o":"166D54","s":"__wti_connection_close","s+":"294"},{"a":"FFFF8325C190","b":"FFFF83110000","o":"14C190","s":"__conn_close","s+":"6E0"},{"a":"FFFF8A268750","b":"FFFF8A1A9000","o":"BF750","s":"_ZN5mongo18WiredTigerKVEngine13cleanShutdownEv","C":"mongo::WiredTigerKVEngine::cleanShutdown()","s+":"910"},{"a":"FFFF90D525DC","b":"FFFF90D2F000","o":"235DC","s":"_ZN5mongo12_GLOBAL__N_134shutdownGlobalStorageEngineCleanlyEPNS_14ServiceContextENS_6StatusEb.constprop.0","C":"mongo::(anonymous namespace)::shutdownGlobalStorageEngineCleanly(mongo::ServiceContext*, mongo::Status, bool) [clone .constprop.0]","s+":"38"},{"a":"FFFF90D52BA8","b":"FFFF90D2F000","o":"23BA8","s":"_ZN5mongo34shutdownGlobalStorageEngineCleanlyEPNS_14ServiceContextE","C":"mongo::shutdownGlobalStorageEngineCleanly(mongo::ServiceContext*)","s+":"64"},{"a":"FFFF9D374A18","b":"FFFF9D321000","o":"53A18","s":"_ZN5mongo12_GLOBAL__N_112shutdownTaskERKNS_16ShutdownTaskArgsE","C":"mongo::(anonymous namespace)::shutdownTask(mongo::ShutdownTaskArgs const&)","s+":"1F68"},{"a":"FFFF9E4BC6B0","b":"FFFF9E25B000","o":"2616B0","s":"_ZN5mongo12_GLOBAL__N_126runRegisteredShutdownTasksESt5stackINS_15unique_functionIFvRKNS_16ShutdownTaskArgsEEEESt5dequeIS7_SaIS7_EEES5_","C":"mongo::(anonymous 
      

            Assignee:
            Unassigned
            Reporter:
            Benety Goh
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated: