Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-6628

Panic if switching slots gets an unexpected error

      I ran test/format on a small (100MB) file system to force ENOSPC. My configuration had logging enabled (i.e., logging=1 in CONFIG).  After running out of space, format hung. First it printed the following WT error messages:

      [1598548238:109309][17939:0x7ff8e1aa8700], WT_SESSION.commit_transaction: __posix_std_fallocate, 58: /mnt/test/RUNDIR/./WiredTigerLog.0000000008: fallocate:: No space left on device
      [1598548238:609985][17939:0x7ff8e1aa8700], WT_SESSION.commit_transaction: __posix_sys_fallocate, 75: /mnt/test/RUNDIR/./WiredTigerLog.0000000008: fallocate:: No space left on device
      [1598548239:110630][17939:0x7ff8e1aa8700], WT_SESSION.commit_transaction: __posix_posix_fallocate, 92: /mnt/test/RUNDIR/./WiredTigerLog.0000000008: fallocate:: No space left on device
      [1598548240:99223][17939:0x7ff8e12a7700], WT_SESSION.commit_transaction: __posix_open_file, 804: /mnt/test/RUNDIR/./WiredTigerTmplog.0000000008: handle-open: open: No space left on device
      

      I had run this test 6-10 times before this without seeing this problem. I think (but I'm not certain) this failure was the first time I saw a "No space left on device" error from opening a file.

      Attaching with GDB shows that all of the threads from format are in commit_transaction().  Most appear blocked in __wt_log_slot_join():

      #0  0x00007ff8e735713f in __GI___select (nfds=nfds@entry=0, readfds=readfds@entry=0x0, 
          writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, 
          timeout=timeout@entry=0x7ff8e2aa9a00) at ../sysdeps/unix/sysv/linux/select.c:41
      #1  0x00000000004a2941 in __wt_sleep (seconds=seconds@entry=0, 
          micro_seconds=micro_seconds@entry=1000) at ../src/os_posix/os_sleep.c:30
      #2  0x000000000048883c in __wt_log_slot_join (session=session@entry=0x1b38760, 
          mysize=mysize@entry=128, flags=flags@entry=0, myslot=myslot@entry=0x7ff8e2aa9b50)
          at ../src/log/log_slot.c:593
      #3  0x000000000048423b in __log_write_internal (flags=0, lsnp=0x0, record=0x3e85650, 
          session=0x1b38760) at ../src/log/log.c:2666
      #4  __wt_log_write (session=session@entry=0x1b38760, record=<optimized out>, lsnp=lsnp@entry=0x0, 
          flags=0) at ../src/log/log.c:2569
      #5  0x00000000004fa57a in __wt_txn_log_commit (session=session@entry=0x1b38760, 
          cfg=cfg@entry=0x7ff8e2aa9cb0) at ../src/txn/txn_log.c:325
      #6  0x00000000004f30c1 in __wt_txn_commit (session=session@entry=0x1b38760, 
          cfg=cfg@entry=0x7ff8e2aa9cb0) at ../src/txn/txn.c:1390
      #7  0x00000000004cf49a in __session_commit_transaction (wt_session=0x1b38760, 
          config=<optimized out>) at ../src/session/session_api.c:1652
      #8  0x00000000004114fe in commit_transaction (tinfo=tinfo@entry=0x3bb6000, 
          prepared=prepared@entry=false) at ../../../test/format/ops.c:550
      #9  0x0000000000413c71 in ops (arg=0x3bb6000) at ../../../test/format/ops.c:1086
      #10 0x00007ff8e81646db in start_thread (arg=0x7ff8e2aaa700) at pthread_create.c:463
      #11 0x00007ff8e7361a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      
      

      One thread, however, is in __log_newfile() where I'm guessing it is running into trouble creating a new log file (or finding the next preallocated one) due to the ENOSPC condition:

      #0  0x00007ff8e7343f37 in sched_yield () at ../sysdeps/unix/syscall-template.S:78
      #1  0x00000000004a2d08 in __wt_yield () at ../src/os_posix/os_yield.c:25
      #2  0x000000000047e9f1 in __log_newfile (session=session@entry=0x1b39330, 
          conn_open=conn_open@entry=false, created=created@entry=0x7ff8e12a69cf) at ../src/log/log.c:1147
      #3  0x000000000047ef5f in __wt_log_acquire (session=session@entry=0x1b39330, 
          recsize=<optimized out>, slot=slot@entry=0x1b9d730) at ../src/log/log.c:1348
      #4  0x0000000000488fba in __log_slot_new (session=0x1b39330) at ../src/log/log_slot.c:268
      #5  __log_slot_switch_internal (session=session@entry=0x1b39330, 
          myslot=myslot@entry=0x7ff8e12a6b50, forced=forced@entry=false, did_work=did_work@entry=0x0)
          at ../src/log/log_slot.c:364
      #6  0x00000000004896f2 in __wt_log_slot_switch (session=session@entry=0x1b39330, 
          myslot=myslot@entry=0x7ff8e12a6b50, retry=retry@entry=true, forced=forced@entry=false, 
          did_work=did_work@entry=0x0) at ../src/log/log_slot.c:405
      #7  0x000000000048442c in __log_write_internal (flags=0, lsnp=0x0, record=0x8495020, 
          session=0x1b39330) at ../src/log/log.c:2673
      #8  __wt_log_write (session=session@entry=0x1b39330, record=<optimized out>, lsnp=lsnp@entry=0x0, 
          flags=0) at ../src/log/log.c:2569
      #9  0x00000000004fa57a in __wt_txn_log_commit (session=session@entry=0x1b39330, 
          cfg=cfg@entry=0x7ff8e12a6cb0) at ../src/txn/txn_log.c:325
      #10 0x00000000004f30c1 in __wt_txn_commit (session=session@entry=0x1b39330, 
          cfg=cfg@entry=0x7ff8e12a6cb0) at ../src/txn/txn.c:1390
      #11 0x00000000004cf49a in __session_commit_transaction (wt_session=0x1b39330, 
          config=<optimized out>) at ../src/session/session_api.c:1652
      #12 0x00000000004114fe in commit_transaction (tinfo=tinfo@entry=0x3bb8100, 
          prepared=prepared@entry=false) at ../../../test/format/ops.c:550
      #13 0x0000000000413c71 in ops (arg=0x3bb8100) at ../../../test/format/ops.c:1086
      #14 0x00007ff8e81646db in start_thread (arg=0x7ff8e12a7700) at pthread_create.c:463
      #15 0x00007ff8e7361a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

            Assignee:
            sue.loverso@mongodb.com Susan LoVerso
            Reporter:
            keith.smith@mongodb.com Keith Smith
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: