Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3284

tree-walk restart bug

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • WT2.9.3, 3.2.19, 3.4.10, 3.5.9
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage 2017-05-08
    • v3.4, v3.2

      One of our automated tests failed because it hit an assertion when running lost connection during test 'test_cursor_random.test_cursor_random.test_cursor_random_deleted_partial(table.sample) in the Python test suite. The original test run was:

      http://build.wiredtiger.com:8080/job/wiredtiger-test-coverage/2962

      The call stack from the core file is:

      (gdb) where
      #0  0x0000003e6ca349c8 in raise () from /lib64/libc.so.6
      #1  0x0000003e6ca3665a in abort () from /lib64/libc.so.6
      #2  0x00007efe3900de33 in __wt_abort () from .libs/libwiredtiger-2.9.2.so
      #3  0x00007efe3907ab44 in __wt_assert () from .libs/libwiredtiger-2.9.2.so
      #4  0x00007efe38f354ee in __tree_walk_internal () from .libs/libwiredtiger-2.9.2.so
      #5  0x00007efe38f35923 in __wt_tree_walk_skip () from .libs/libwiredtiger-2.9.2.so
      #6  0x00007efe38f007e4 in __wt_btcur_next_random () from .libs/libwiredtiger-2.9.2.so
      #7  0x00007efe38f83f45 in __wt_curfile_next_random () from .libs/libwiredtiger-2.9.2.so
      #8  0x00007efe38fb4678 in __curtable_next_random () from .libs/libwiredtiger-2.9.2.so
      #9  0x00007efe393a72fd in _wrap_Cursor_next (self=<optimized out>, args=<optimized out>) at wiredtiger_wrap.c:3887
      

      The only other thread not waiting on a condition is:

      Thread 2 (Thread 0x7efe36768700 (LWP 93680)):
      #0  0x0000003e6cae6e07 in sched_yield () from /lib64/libc.so.6
      #1  0x00007efe3901a6dd in __wt_yield () from .libs/libwiredtiger-2.9.2.so
      #2  0x00007efe3908209d in __wt_writeunlock () from .libs/libwiredtiger-2.9.2.so
      #3  0x00007efe39033956 in __wt_reconcile () from .libs/libwiredtiger-2.9.2.so
      #4  0x00007efe38fc9bae in __evict_review () from .libs/libwiredtiger-2.9.2.so
      #5  0x00007efe38fc848b in __wt_evict () from .libs/libwiredtiger-2.9.2.so
      #6  0x00007efe38fc5687 in __evict_page () from .libs/libwiredtiger-2.9.2.so
      #7  0x00007efe38fc1cb5 in __evict_lru_pages () from .libs/libwiredtiger-2.9.2.so
      #8  0x00007efe38fc0490 in __evict_pass () from .libs/libwiredtiger-2.9.2.so
      #9  0x00007efe38fbf54b in __evict_server () from .libs/libwiredtiger-2.9.2.so
      #10 0x00007efe38fbef77 in __wt_evict_thread_run () from .libs/libwiredtiger-2.9.2.so
      #11 0x00007efe3908c4e8 in __thread_run () from .libs/libwiredtiger-2.9.2.so
      #12 0x0000003e6ce07555 in start_thread () from /lib64/libpthread.so.0
      #13 0x0000003e6cb02ded in clone () from /lib64/libc.so.6
      

      Since the build doesn't have debug symbols, it's difficult to extract more information. It's likely that the assertion was either:

      560                                 /*
      561                                  * If restarting from some original position,
      562                                  * repeat the increment or decrement we made at
      563                                  * that time. Otherwise, couple is an internal
      564                                  * page we've acquired after moving from that
      565                                  * starting position and we can treat it as a
      566                                  * new page. This works because we never acquire
      567                                  * a hazard pointer on a leaf page we're not
      568                                  * going to return to our caller, this will quit
      569                                  * working if that ever changes.
      570                                  */
      571                                 WT_ASSERT(session,
      572                                     couple == couple_orig ||
      573                                     WT_PAGE_IS_INTERNAL(couple->page));
      574                                 ref = couple;
      575                                 __ref_index_slot(session, ref, &pindex, &slot);
      576                                 if (couple == couple_orig)
      577                                         break;
      

      OR:

      348         /* If no page is active, begin a walk from the start/end of the tree. */
      349         if (ref == NULL) {
      350 restart:        /*
      351                  * We can be here with a NULL or root WT_REF; the page release
      352                  * function handles them internally, don't complicate this code
      353                  * by calling them out.
      354                  */
      355                 WT_ERR(__wt_page_release(session, couple, flags));
      356
      357                 /*
      358                  * We're not supposed to walk trees without root pages. As this
      359                  * has not always been the case, assert to debug that change.
      360                  */
      361                 WT_ASSERT(session, btree->root.page != NULL);
      362
      363                 couple = couple_orig = ref = &btree->root;
      364                 initial_descent = true;
      365                 goto descend;
      366         }
      

      Though it could also possibly be an assertion that is within an inline function or macro.

            Assignee:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Reporter:
            alexander.gorrod@mongodb.com Alexander Gorrod
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: