Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-6743

Save and restore session's data handle during cursor reopen call

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • WT10.0.0, 4.9.0, 4.4.4
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Storage - Ra 2020-12-14

      There is scenario reported from MongoDB testing where __wt_cursor_cache_release accessed session->dhandle but the underlying data structure has been freed by sweep server. The session->dhandle was being used to update stats. There was a recent change, WT-6478, that fixed the ordering of logging statements and might have exposed an underlying bug.

      The backtrace for illegal access look this:

      #6  0x0000557bfe58ad8b in __asan_report_load8 ()
          at /data/mci/9d9a8143741722fd82fd8405503b7409/toolchain-builder/tmp/build-llvm.sh-QI7/llvm/projects/compiler-rt/lib/asan/asan_rtl.cc:121
      #7  0x0000557bff45a080 in __wt_cursor_cache_release (session=<optimized out>, cursor=0x61900032c880, 
          released=0x7f9d5e99dd80) at src/third_party/wiredtiger/src/cursor/cur_std.c:740
      #8  0x0000557bff43798f in __curfile_close (cursor=0x61900032c880) at src/third_party/wiredtiger/src/cursor/cur_file.c:494
      #9  0x0000557bff370633 in mongo::WiredTigerSession::closeAllCursors (this=0x60700002e490, uri=...)
          at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp:175
      #10 0x0000557bff2a4474 in mongo::WiredTigerKVEngine::dropIdent (this=0x61900006f980, opCtx=<optimized out>, 
          ru=<optimized out>, ident=...) at src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp:1860
      #11 0x0000557c01364126 in mongo::TemporaryKVRecordStore::deleteTemporaryTable (this=0x6030002073a0, opCtx=0x6130001b2740)
          at src/mongo/db/storage/kv/temporary_kv_record_store.cpp:52
      #12 0x0000557c0182d7f1 in mongo::IndexBuildInterceptor::deleteTemporaryTables (this=0x60f0002298d0, 
          opCtx=<optimized out>) at src/mongo/db/index/index_build_interceptor.cpp:94
      #13 0x0000557c0649004a in mongo::RecoveryUnit::commitRegisteredChanges (this=<optimized out>, commitTimestamp=...)
          at src/mongo/db/storage/recovery_unit.cpp:87
      #14 0x0000557bff34bf66 in mongo::WiredTigerRecoveryUnit::_commit (this=0x6110001691c0)
          at src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp:191
      #15 0x0000557bff34fe74 in mongo::WiredTigerRecoveryUnit::doCommitUnitOfWork (this=0x6110001691c0)
          at src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp:238
      #16 0x0000557c0648db75 in mongo::RecoveryUnit::commitUnitOfWork (this=0x6110001691c0)

      Snippet of code from __wt_cursor_cache_release where illegal access happened:

          /*
           * Caching the cursor releases its data handle. So we have to update statistics first. If
           * caching fails, we'll decrement the statistics after reopening the cursor (and getting the
           * data handle back).
           */
          WT_STAT_CONN_INCR(session, cursor_cache);
          WT_STAT_DATA_INCR(session, cursor_cache);  <-------------------- 
          WT_ERR(cursor->cache(cursor));
      

       Dump for dhandle:

      (gdb) p/x* ((WT_SESSION_IMPL*)cursor->session)->dhandle
      $2 = {rwlock = {u = {v = 0xabababab470000c3, s = {current = 0xc3, next = 0x0, reader = 0x0, readers_queued = 0x47, 
              readers_active = 0xabababab}}, stat_read_count_off = 0xabab, stat_write_count_off = 0xabab, 
          stat_app_usecs_off = 0xabab, stat_int_usecs_off = 0xabab, stat_session_usecs_off = 0xabab, 
          cond_readers = 0xabababababababab, cond_writers = 0xabababababababab}, q = {tqe_next = 0xabababababababab, 
          tqe_prev = 0xabababababababab}, hashq = {tqe_next = 0xabababababababab, tqe_prev = 0xabababababababab}, 
        name = 0xabababababababab, name_hash = 0xabababababababab, checkpoint = 0xabababababababab, cfg = 0xabababababababab, 
        meta_base = 0xabababababababab, session_ref = 0xabababab, session_inuse = 0xabababab, excl_ref = 0xabababab, 
        timeofdeath = 0xabababababababab, excl_session = 0xabababababababab, dsrc = 0xabababababababab, 
        handle = 0xabababababababab, type = 0xabababab, compact_skip = 0xab, close_lock = {lock = {__data = {
              __lock = 0xabababab, __count = 0xabababab, __owner = 0xabababab, __nusers = 0xabababab, __kind = 0xabababab, 
              __spins = 0xabab, __elision = 0xabab, __list = {__prev = 0xabababababababab, __next = 0xabababababababab}}, 
            __size = {0xab <repeats 40 times>}, __align = 0xabababababababab}, name = 0xabababababababab, 
          stat_count_off = 0xabab, stat_app_usecs_off = 0xabab, stat_int_usecs_off = 0xabab, stat_session_usecs_off = 0xabab, 
          initialized = 0xab}, stats = {0xabababababababab <repeats 23 times>}, stat_array = 0xabababababababab, 
        flags = 0xabababab} 

      ASAN reports that the data structure memory was freed from here:

      freed by thread T12 here: 
      #0 0x557bfe584332 in free (/data/mci/9a51bb629d4efbf952ef2de46f7b0fa3/src/dist-test/bin/mongod+0x1e99332) 
      #1 0x557bff3eb3a4 in __conn_dhandle_destroy (/data/mci/9a51bb629d4efbf952ef2de46f7b0fa3/src/dist-test/bin/mongod+0x2d003a4) 
      #2 0x557bff3f1e3b in __wt_conn_dhandle_discard_single (/data/mci/9a51bb629d4efbf952ef2de46f7b0fa3/src/dist-test/bin/mongod+0x2d06e3b) 
      #3 0x557bff40b8e7 in __sweep_server (/data/mci/9a51bb629d4efbf952ef2de46f7b0fa3/src/dist-test/bin/mongod+0x2d208e7) 
      #4 0x7f9d87cd76da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da) 

            Assignee:
            donald.anderson@mongodb.com Donald Anderson
            Reporter:
            haseeb.bokhari@mongodb.com Haseeb Bokhari (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: