Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.6.1
    • Labels:
      None
    • # Replies:
      6
    • Last comment by Customer:
      true

      Description

      There is a segfault being reported when testing on Solaris. There are scant details in the error log.

        Activity

        Hide
        alexander.gorrod Alexander Gorrod added a comment -

        A call stack capture of the broken thread:

        #0  0x0000000000468bf0 in __wt_readunlock (session=0x569650, rwlock=0xabababababababab) at src/os_posix/os_mtx_rw.c:137
        #1  0x00000000004629ea in __wt_lsm_tree_readunlock (session=0x569650, lsm_tree=0x7accd0) at src/lsm/lsm_tree.c:1153
        #2  0x00000000004fe5b8 in __wt_lsm_get_chunk_to_flush (session=0x569650, lsm_tree=0x7accd0, force=0, chunkp=0x7ffffd2faf50)
            at src/lsm/lsm_work_unit.c:83
        #3  0x00000000004636c9 in __lsm_worker_general_op (session=0x569650, cookie=0x55f8d8, completed=0x7ffffd2faf80)
            at src/lsm/lsm_worker.c:55
        #4  0x00000000004638e0 in __lsm_worker (arg=0x55f8d8) at src/lsm/lsm_worker.c:122
        #5  0x00007fffff2a0dba in _thrp_setup () from /lib/64/libc.so.1
        #6  0x00007fffff2a10d0 in ?? () from /lib/64/libc.so.1
        #7  0x0000000000000000 in ?? ()
        

        It appears likely that there is a race between shutting down and clearing out the work thread queue

        Show
        alexander.gorrod Alexander Gorrod added a comment - A call stack capture of the broken thread: #0 0x0000000000468bf0 in __wt_readunlock (session=0x569650, rwlock=0xabababababababab) at src/os_posix/os_mtx_rw.c:137 #1 0x00000000004629ea in __wt_lsm_tree_readunlock (session=0x569650, lsm_tree=0x7accd0) at src/lsm/lsm_tree.c:1153 #2 0x00000000004fe5b8 in __wt_lsm_get_chunk_to_flush (session=0x569650, lsm_tree=0x7accd0, force=0, chunkp=0x7ffffd2faf50) at src/lsm/lsm_work_unit.c:83 #3 0x00000000004636c9 in __lsm_worker_general_op (session=0x569650, cookie=0x55f8d8, completed=0x7ffffd2faf80) at src/lsm/lsm_worker.c:55 #4 0x00000000004638e0 in __lsm_worker (arg=0x55f8d8) at src/lsm/lsm_worker.c:122 #5 0x00007fffff2a0dba in _thrp_setup () from /lib/64/libc.so.1 #6 0x00007fffff2a10d0 in ?? () from /lib/64/libc.so.1 #7 0x0000000000000000 in ?? () It appears likely that there is a race between shutting down and clearing out the work thread queue
        Hide
        alexander.gorrod Alexander Gorrod added a comment -

        This reproduced reliably on Solaris, running test/fops against an LSM tree.

        When I added:

        --- a/src/lsm/lsm_tree.c
        +++ b/src/lsm/lsm_tree.c
        @@ -27,6 +27,7 @@ __lsm_tree_discard(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_
         
                WT_UNUSED(final);       /* Only used in diagnostic builds */
         
        +       WT_ASSERT(session, lsm_tree->queue_ref == 0);
                /* We may be destroying an lsm_tree before it was added. */
                if (F_ISSET(lsm_tree, WT_LSM_TREE_OPEN)) {
                        WT_ASSERT(session, final ||
        

        It fired - which indicated that there was a race clearing out the work queue on shutdown.

        Show
        alexander.gorrod Alexander Gorrod added a comment - This reproduced reliably on Solaris, running test/fops against an LSM tree. When I added: --- a/src/lsm/lsm_tree.c +++ b/src/lsm/lsm_tree.c @@ -27,6 +27,7 @@ __lsm_tree_discard(WT_SESSION_IMPL *session, WT_LSM_TREE *lsm_ WT_UNUSED(final); /* Only used in diagnostic builds */ + WT_ASSERT(session, lsm_tree->queue_ref == 0); /* We may be destroying an lsm_tree before it was added. */ if (F_ISSET(lsm_tree, WT_LSM_TREE_OPEN)) { WT_ASSERT(session, final || It fired - which indicated that there was a race clearing out the work queue on shutdown.
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

        Message: Fix a race on shutdown in LSM.

        A worker thread could push a new work unit during shutdown, which
        could lead to a work unit being processed after the underlying tree
        was freed.

        refs WT-1935
        Branch: develop
        https://github.com/wiredtiger/wiredtiger/commit/328aff0a3f93becec288d8c682650251002c8d23

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'} Message: Fix a race on shutdown in LSM. A worker thread could push a new work unit during shutdown, which could lead to a work unit being processed after the underlying tree was freed. refs WT-1935 Branch: develop https://github.com/wiredtiger/wiredtiger/commit/328aff0a3f93becec288d8c682650251002c8d23
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'sueloverso', u'name': u'Susan LoVerso', u'email': u'sue@wiredtiger.com'}

        Message: Increment queue_ref count before checking flag. WT-1935
        Branch: develop
        https://github.com/wiredtiger/wiredtiger/commit/b45f027d009f76f947d0a353a7404b0db17e3e99

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'sueloverso', u'name': u'Susan LoVerso', u'email': u'sue@wiredtiger.com'} Message: Increment queue_ref count before checking flag. WT-1935 Branch: develop https://github.com/wiredtiger/wiredtiger/commit/b45f027d009f76f947d0a353a7404b0db17e3e99
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

        Message: Merge pull request #1978 from wiredtiger/lsm-shutdown-race-2

        Increment queue_ref count before checking flag. WT-1935
        Branch: develop
        https://github.com/wiredtiger/wiredtiger/commit/86f4de9603e12182f0141e9aa867294517733b08

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'} Message: Merge pull request #1978 from wiredtiger/lsm-shutdown-race-2 Increment queue_ref count before checking flag. WT-1935 Branch: develop https://github.com/wiredtiger/wiredtiger/commit/86f4de9603e12182f0141e9aa867294517733b08
        Hide
        alexander.gorrod Alexander Gorrod added a comment -

        Fixed with merge of LSM bug fix.

        Show
        alexander.gorrod Alexander Gorrod added a comment - Fixed with merge of LSM bug fix.

          People

          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:
              Days since reply:
              2 years, 6 weeks, 1 day ago
              Date of 1st Reply: