Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-646

LSM segv with wtperf and parallel-pop-lsm configuration

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The parallel-pop-lsm wtperf configuration is consistently dropping core.

      Here's the stack and other useful info.

      (gdb) p *lsm_tree
      $3 =

      {name = 0x19f0fe0 "lsm:test", config = 0x0, filename = 0x19f0fe4 "test", ... ckpt_session = 0x19da890, ckpt_tid = 139827049776896, bloom_session = 0x0, bloom_tid = 0, chunk = 0x19f0a40, chunk_alloc = 640, nchunks = 36, last = 257, old_chunks = 0x7f2be0202ca0, old_alloc = 640, nold_chunks = 80, old_avail = 31, flags = 28}

      Note lsm_tree->nchunks here is 36.

      (gdb) bt
      #0 0x000000000046cf90 in __clsm_open_cursors (clsm=0x7f2be4002b00, update=1,
      start_chunk=0, start_id=0) at ../src/lsm/lsm_cursor.c:339
      WT-1 0x000000000046c4da in __clsm_enter (clsm=0x7f2be4002b00, update=1)
      at ../src/lsm/lsm_cursor.c:93
      WT-2 0x000000000046f86e in __clsm_insert (cursor=0x7f2be4002b00)
      at ../src/lsm/lsm_cursor.c:1048
      WT-3 0x00000000004042b4 in populate_thread (arg=0x7fff485b5e90)
      at ../../../bench/wtperf/wtperf.c:467
      WT-4 0x00007f2c084dcc6b in start_thread () from /lib64/libpthread.so.0
      WT-5 0x00007f2c080215ed in clone () from /lib64/libc.so.6
      (gdb) p *clsm
      $4 =

      {... lsm_tree = 0x19f11a0, dsk_gen = 481, nchunks = 50, nupdates = 1, blooms = 0x7f2be51910f0, cursors = 0x7f2be5164070, current = 0x0, primary_chunk = 0x0, txnid_max = 0x7f2be51ef7f0, flags = 64}

      Note clsm->nchunks is 50 here.

      (gdb) p i
      $5 = 49
      (gdb) p skip_chunks
      $6 = 49

      Clearly when we execute:
      chunk = lsm_tree->chunk[i + start_chunk];

      we're way beyond the end of the lsm_tree->chunk array.

      I debugged this and the issue is that we drop the lock to close the cursors and during that time the lsm_tree chunks changes and reduces. Therefore all the old values for skip_chunks are no longer valid.

      I have a fix I'm trying.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                sue.loverso Sue LoVerso
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: