Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-675

LSM read fails with ENOENT return

    • Type: Icon: Task Task
    • Resolution: Done
    • WT1.6.5
    • Affects Version/s: None
    • Component/s: None

      Using the following test/format configuration:

      data_source=lsm
      cache=20
      compression=none
      key_max=64          
      ops=2000
      rows=10000
      insert_pct=45
      leaf_page_max=12
      value_max=64    
      threads=25 
      

      I see the following failure:

      t: read_row: read row 17: No such file or directory
      

      The failure reproduces quickly on tinderbox - generally within 10 runs. Digging further, the metadata search when opening a handle is returning WT_NOTFOUND:

      (gdb) where
      #0  __conn_btree_config_set (session=0x8de9f0) at ../src/conn/conn_dhandle.c:250
      WT-1  0x00000000004668a7 in __conn_btree_open (session=0x8de9f0, op_cfg=0x0, flags=0)
          at ../src/conn/conn_dhandle.c:308
      WT-2  0x0000000000466d51 in __wt_conn_btree_get (session=0x8de9f0, 
          name=0x7fffdc0139a0 "file:wt-000002.lsm", ckpt=0x0, op_cfg=0x0, flags=0)
          at ../src/conn/conn_dhandle.c:417
      WT-3  0x0000000000434db0 in __session_open_btree (session=0x8de9f0, 
          name=0x7fffdc0139a0 "file:wt-000002.lsm", ckpt=0x0, op_cfg=0x0, dead=0, 
          flags=0) at ../src/session/session_dhandle.c:248
      WT-4  0x000000000043500b in __wt_session_get_btree (session=0x8de9f0, 
          uri=0x7fffdc0139a0 "file:wt-000002.lsm", checkpoint=0x0, cfg=0x0, flags=0)
          at ../src/session/session_dhandle.c:308
      WT-5  0x0000000000434bf3 in __wt_session_get_btree_ckpt (session=0x8de9f0, 
          uri=0x7fffdc0139a0 "file:wt-000002.lsm", cfg=0x0, flags=0)
          at ../src/session/session_dhandle.c:187
      WT-6  0x000000000046e9d0 in __wt_curfile_open (session=0x8de9f0, 
          uri=0x7fffdc0139a0 "file:wt-000002.lsm", owner=0x901650, cfg=0x0, 
          cursorp=0x8ec708) at ../src/cursor/cur_file.c:447
      WT-7  0x0000000000431def in __wt_open_cursor (session=0x8de9f0, 
          uri=0x7fffdc0139a0 "file:wt-000002.lsm", owner=0x901650, cfg=0x0, 
          cursorp=0x8ec708) at ../src/session/session_api.c:193
      WT-8  0x000000000047e685 in __clsm_open_cursors (clsm=0x901650, update=0, 
          start_chunk=0, start_id=0) at ../src/lsm/lsm_cursor.c:340
      WT-9  0x000000000047db26 in __clsm_enter (clsm=0x901650, update=0)
          at ../src/lsm/lsm_cursor.c:93
      WT-10 0x000000000047fe43 in __clsm_search (cursor=0x901650)
          at ../src/lsm/lsm_cursor.c:743
      WT-11 0x000000000041284b in read_row (cursor=0x901650, key=0x7fffffffe270, keyno=10)
          at ../../../test/format/ops.c:518
      WT-12 0x00000000004126db in wts_read_scan () at ../../../test/format/ops.c:477
      

      The file exists on disk. The file was created via an lsm_tree_switch that completed without error (done in an application thread via __clsm_put).

      The thread that gets the failed open is the last thread that is running in test/format:

      (gdb) thread apply all where 3
      
      Thread 415 (Thread 0x7ffff6fd1700 (LWP 8623)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217
      WT-1  0x0000000000424dbf in __wt_cond_wait (session=0x8dec00, cond=0x8ea9a0, 
          usecs=100000) at ../src/os_posix/os_mtx.c:75
      WT-2  0x000000000042102c in __wt_lsm_checkpoint_worker (arg=0x8ebc80)
          at ../src/lsm/lsm_worker.c:314
      (More stack frames follow...)
      
      Thread 414 (Thread 0x7ffff67d0700 (LWP 8622)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217
      WT-1  0x0000000000424dbf in __wt_cond_wait (session=0x8dee10, cond=0x8ea9a0, 
          usecs=100000) at ../src/os_posix/os_mtx.c:75
      WT-2  0x00000000004207c5 in __wt_lsm_merge_worker (vargs=0x8ea7e0)
          at ../src/lsm/lsm_worker.c:109
      (More stack frames follow...)
      
      Thread 413 (Thread 0x7ffff5fcf700 (LWP 8621)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165
      WT-1  0x0000000000424dec in __wt_cond_wait (session=0x8de7e0, cond=0x9015e0, usecs=0)
          at ../src/os_posix/os_mtx.c:82
      WT-2  0x00000000004683ad in __log_archive_server (arg=0x8de7e0)
          at ../src/conn/conn_log.c:134
      (More stack frames follow...)
      
      Thread 412 (Thread 0x7ffff77d2700 (LWP 8620)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217
      WT-1  0x0000000000424dbf in __wt_cond_wait (session=0x8de5d0, cond=0x8e5020, 
          usecs=100000) at ../src/os_posix/os_mtx.c:75
      WT-2  0x000000000043ffa9 in __wt_cache_evict_server (arg=0x8de5d0)
          at ../src/btree/bt_evict.c:167
      (More stack frames follow...)
      
      Thread 1 (Thread 0x7ffff7ddd740 (LWP 8176)):
      #0  __conn_btree_config_set (session=0x8de9f0) at ../src/conn/conn_dhandle.c:250
      WT-1  0x00000000004668a7 in __conn_btree_open (session=0x8de9f0, op_cfg=0x0, flags=0)
          at ../src/conn/conn_dhandle.c:308
      WT-2  0x0000000000466d51 in __wt_conn_btree_get (session=0x8de9f0, 
          name=0x7fffdc0139a0 "file:wt-000002.lsm", ckpt=0x0, op_cfg=0x0, flags=0)
          at ../src/conn/conn_dhandle.c:417
      (More stack frames follow...)
      

      If I let the test run to completion, list shows the file as being present:

      format $ ../../wt -h RUNDIR/ list 
      colgroup:wt
      file:wt-000001.lsm
      file:wt-000002.lsm
      lsm:wt
      table:wt
      

      I don't see any rollbacks via metadata tracking.

            Assignee:
            Unassigned Unassigned
            Reporter:
            alexander.gorrod@mongodb.com Alexander Gorrod
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: