There is scenario reported from MongoDB testing where __wt_cursor_cache_release accessed session->dhandle but the underlying data structure has been freed by sweep server. The session->dhandle was being used to update stats. There was a recent change, WT-6478, that fixed the ordering of logging statements and might have exposed an underlying bug.
The backtrace for illegal access look this:
#6 0x0000557bfe58ad8b in __asan_report_load8 () at /data/mci/9d9a8143741722fd82fd8405503b7409/toolchain-builder/tmp/build-llvm.sh-QI7/llvm/projects/compiler-rt/lib/asan/asan_rtl.cc:121 #7 0x0000557bff45a080 in __wt_cursor_cache_release (session=<optimized out>, cursor=0x61900032c880, released=0x7f9d5e99dd80) at src/third_party/wiredtiger/src/cursor/cur_std.c:740 #8 0x0000557bff43798f in __curfile_close (cursor=0x61900032c880) at src/third_party/wiredtiger/src/cursor/cur_file.c:494 #9 0x0000557bff370633 in mongo::WiredTigerSession::closeAllCursors (this=0x60700002e490, uri=...) at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp:175 #10 0x0000557bff2a4474 in mongo::WiredTigerKVEngine::dropIdent (this=0x61900006f980, opCtx=<optimized out>, ru=<optimized out>, ident=...) at src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp:1860 #11 0x0000557c01364126 in mongo::TemporaryKVRecordStore::deleteTemporaryTable (this=0x6030002073a0, opCtx=0x6130001b2740) at src/mongo/db/storage/kv/temporary_kv_record_store.cpp:52 #12 0x0000557c0182d7f1 in mongo::IndexBuildInterceptor::deleteTemporaryTables (this=0x60f0002298d0, opCtx=<optimized out>) at src/mongo/db/index/index_build_interceptor.cpp:94 #13 0x0000557c0649004a in mongo::RecoveryUnit::commitRegisteredChanges (this=<optimized out>, commitTimestamp=...) at src/mongo/db/storage/recovery_unit.cpp:87 #14 0x0000557bff34bf66 in mongo::WiredTigerRecoveryUnit::_commit (this=0x6110001691c0) at src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp:191 #15 0x0000557bff34fe74 in mongo::WiredTigerRecoveryUnit::doCommitUnitOfWork (this=0x6110001691c0) at src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp:238 #16 0x0000557c0648db75 in mongo::RecoveryUnit::commitUnitOfWork (this=0x6110001691c0)
Snippet of code from __wt_cursor_cache_release where illegal access happened:
/*
* Caching the cursor releases its data handle. So we have to update statistics first. If
* caching fails, we'll decrement the statistics after reopening the cursor (and getting the
* data handle back).
*/
WT_STAT_CONN_INCR(session, cursor_cache);
WT_STAT_DATA_INCR(session, cursor_cache); <--------------------
WT_ERR(cursor->cache(cursor));
Dump for dhandle:
(gdb) p/x* ((WT_SESSION_IMPL*)cursor->session)->dhandle $2 = {rwlock = {u = {v = 0xabababab470000c3, s = {current = 0xc3, next = 0x0, reader = 0x0, readers_queued = 0x47, readers_active = 0xabababab}}, stat_read_count_off = 0xabab, stat_write_count_off = 0xabab, stat_app_usecs_off = 0xabab, stat_int_usecs_off = 0xabab, stat_session_usecs_off = 0xabab, cond_readers = 0xabababababababab, cond_writers = 0xabababababababab}, q = {tqe_next = 0xabababababababab, tqe_prev = 0xabababababababab}, hashq = {tqe_next = 0xabababababababab, tqe_prev = 0xabababababababab}, name = 0xabababababababab, name_hash = 0xabababababababab, checkpoint = 0xabababababababab, cfg = 0xabababababababab, meta_base = 0xabababababababab, session_ref = 0xabababab, session_inuse = 0xabababab, excl_ref = 0xabababab, timeofdeath = 0xabababababababab, excl_session = 0xabababababababab, dsrc = 0xabababababababab, handle = 0xabababababababab, type = 0xabababab, compact_skip = 0xab, close_lock = {lock = {__data = { __lock = 0xabababab, __count = 0xabababab, __owner = 0xabababab, __nusers = 0xabababab, __kind = 0xabababab, __spins = 0xabab, __elision = 0xabab, __list = {__prev = 0xabababababababab, __next = 0xabababababababab}}, __size = {0xab <repeats 40 times>}, __align = 0xabababababababab}, name = 0xabababababababab, stat_count_off = 0xabab, stat_app_usecs_off = 0xabab, stat_int_usecs_off = 0xabab, stat_session_usecs_off = 0xabab, initialized = 0xab}, stats = {0xabababababababab <repeats 23 times>}, stat_array = 0xabababababababab, flags = 0xabababab}
ASAN reports that the data structure memory was freed from here:
freed by thread T12 here: #0 0x557bfe584332 in free (/data/mci/9a51bb629d4efbf952ef2de46f7b0fa3/src/dist-test/bin/mongod+0x1e99332) #1 0x557bff3eb3a4 in __conn_dhandle_destroy (/data/mci/9a51bb629d4efbf952ef2de46f7b0fa3/src/dist-test/bin/mongod+0x2d003a4) #2 0x557bff3f1e3b in __wt_conn_dhandle_discard_single (/data/mci/9a51bb629d4efbf952ef2de46f7b0fa3/src/dist-test/bin/mongod+0x2d06e3b) #3 0x557bff40b8e7 in __sweep_server (/data/mci/9a51bb629d4efbf952ef2de46f7b0fa3/src/dist-test/bin/mongod+0x2d208e7) #4 0x7f9d87cd76da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
- is related to
-
WT-6478 Cursor cache statistics not getting incremented
- Closed