Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-9721

Format stress tests generate cores even when successful

    • 1
    • v6.0, v5.0, v4.4

      Format stress tests often have the following on successful runs:

       

      [2022/08/06 19:16:16.354] format.sh: job in /data/mci/626bd9673ee1cabde887d60cdb65878f/wiredtiger/cmake_build/test/format/RUNDIR.155 killed
      [2022/08/06 19:16:16.354] ./format.sh: line 403: kill: (-129947) - No such process
      [2022/08/06 19:16:16.354] ./format.sh: line 404: wait: pid 129947 is not a child of this shell
      [2022/08/06 19:16:16.639] format.sh: job in /data/mci/626bd9673ee1cabde887d60cdb65878f/wiredtiger/cmake_build/test/format/RUNDIR.157 killed
      [2022/08/06 19:16:16.639] ./format.sh: line 403: kill: (-130606) - No such process
      [2022/08/06 19:16:16.639] ./format.sh: line 404: wait: pid 130606 is not a child of this shell
      [2022/08/06 19:16:18.401] format.sh: job in /data/mci/626bd9673ee1cabde887d60cdb65878f/wiredtiger/cmake_build/test/format/RUNDIR.158 killed
      [2022/08/06 19:16:18.401] ./format.sh: line 403: kill: (-130642) - No such process
      [2022/08/06 19:16:18.401] ./format.sh: line 404: wait: pid 130642 is not a child of this shell
      [2022/08/06 19:16:18.664] ./format.sh: line 410: wait: pid 131519 is not a child of this shell
      [2022/08/06 19:16:18.664] format.sh: job in /data/mci/626bd9673ee1cabde887d60cdb65878f/wiredtiger/cmake_build/test/format/RUNDIR.159 successfully completed
      [2022/08/06 19:16:18.708] format.sh: job in /data/mci/626bd9673ee1cabde887d60cdb65878f/wiredtiger/cmake_build/test/format/RUNDIR.161 killed
      [2022/08/06 19:16:18.708] ./format.sh: line 403: kill: (-131761) - No such process
      [2022/08/06 19:16:18.708] ./format.sh: line 404: wait: pid 131761 is not a child of this shell
      [2022/08/06 19:16:18.811] format.sh: job in /data/mci/626bd9673ee1cabde887d60cdb65878f/wiredtiger/cmake_build/test/format/RUNDIR.162 killed
      [2022/08/06 19:16:18.811] ./format.sh: line 403: kill: (-132211) - No such process
      [2022/08/06 19:16:18.811] ./format.sh: line 404: wait: pid 132211 is not a child of this shell
      [2022/08/06 19:16:18.890] format.sh: job in /data/mci/626bd9673ee1cabde887d60cdb65878f/wiredtiger/cmake_build/test/format/RUNDIR.163 killed
      [2022/08/06 19:16:18.890] ./format.sh: line 403: kill: (-132556) - No such process
      [2022/08/06 19:16:18.890] ./format.sh: line 404: wait: pid 132556 is not a child of this shell
      [2022/08/06 19:16:19.282] format.sh: job in /data/mci/626bd9673ee1cabde887d60cdb65878f/wiredtiger/cmake_build/test/format/RUNDIR.164 killed
      [2022/08/06 19:16:19.282] ./format.sh: line 403: kill: (-133240) - No such process
      [2022/08/06 19:16:19.329] ./format.sh: line 404: wait: pid 133240 is not a child of this shell
      [2022/08/06 19:16:19.329] format.sh: 157 successful jobs, 0 failed jobs
      [2022/08/06 19:16:19.348] format.sh: 157 successful jobs, 0 failed jobs
      [2022/08/06 19:16:19.348] format.sh: run ending at Sat Aug 6 19:16:19 UTC 2022
      [2022/08/06 19:16:19.354] Finished 'shell.exec' in "format test script" in 6h0m9.220337295s
      [2022/08/06 19:16:19.356] Task completed - SUCCESS.
      [2022/08/06 19:16:20.411] Running post-task commands.
      [2022/08/06 19:16:20.440] Running command 'shell.exec' in "dump stacktraces" (step 1.1 of 5)
      [2022/08/06 19:16:20.448] # Parse through any failures inside the WT_TEST directory.
      [2022/08/06 19:16:20.448] /opt/mongodbtoolchain/v3/bin/python3 ../test/evergreen/print_stack_trace.py --unit_test -e /opt/mongodbtoolchain/v3/bin/python3 -c WT_TEST -l .
      [2022/08/06 19:16:22.083] # Look through the stress format directory for any coredumps.
      [2022/08/06 19:16:22.083] cd test/format
      [2022/08/06 19:16:22.083] /opt/mongodbtoolchain/v3/bin/python3 ../../../test/evergreen/print_stack_trace.py --format -e ./t -c . -l .
      [2022/08/06 19:16:23.371] ----------------------
      [2022/08/06 19:16:23.371] | ./dump_t.129947.core |
      [2022/08/06 19:16:23.371] ----------------------
      [2022/08/06 19:16:23.371] [New LWP 129985]
      [2022/08/06 19:16:23.371] [New LWP 129947]
      [2022/08/06 19:16:23.371] [New LWP 129979]
      [2022/08/06 19:16:23.371] [New LWP 129990]
      [2022/08/06 19:16:23.371] [New LWP 133084]
      [2022/08/06 19:16:23.371] [New LWP 133085]
      [2022/08/06 19:16:23.371] [New LWP 132490]
      [2022/08/06 19:16:23.371] [New LWP 133086]
      [2022/08/06 19:16:23.371] [New LWP 133087]
      [2022/08/06 19:16:23.371] [New LWP 133088]
      [2022/08/06 19:16:23.371] [New LWP 129987]
      [2022/08/06 19:16:23.371] [New LWP 129988]
      [2022/08/06 19:16:23.371] [New LWP 129989]
      [2022/08/06 19:16:23.371] [New LWP 129986]
      [2022/08/06 19:16:23.865] [Thread debugging using libthread_db enabled]
      [2022/08/06 19:16:23.865] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
      [2022/08/06 19:16:24.857] Core was generated by `./t -c /data/mci/626bd9673ee1cabde887d60cdb65878f/wiredtiger/cmake_build/test/f'.
      [2022/08/06 19:16:24.857] Program terminated with signal SIGABRT, Aborted.
      [2022/08/06 19:16:24.857] #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      [2022/08/06 19:16:24.893] [Current thread is 1 (Thread 0xffff7ecad100 (LWP 129985))]
      [2022/08/06 19:16:24.893] #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      [2022/08/06 19:16:24.899] #1 0x0000ffff81909aac in __GI_abort () at abort.c:79
      [2022/08/06 19:16:24.899] #2 0x0000ffff81c821c0 in __wt_abort (session=session@entry=0x35cff100) at ../src/os_common/os_abort.c:30
      [2022/08/06 19:16:24.909] #3 0x0000ffff81cecc7c in _wt_panic_func (session=session@entry=0x35cff100, error=error@entry=2, func=func@entry=0xffff81d514d8 <PRETTY_FUNCTION.38759> "_reconcile", line=line@entry=328, category=category@entry=WT_VERB_DEFAULT, fmt=fmt@entry=0xffff81d51f10 "reconciliation failed after building the disk image") at ../src/support/err.c:553
      [2022/08/06 19:16:24.917] #4 0x0000ffff81cbe424 in __reconcile (page_lockedp=0xffff7ecac59f, flags=1133996544, salvage=0x0, ref=0x549e46e0, session=0x35cff100) at ../src/reconcile/rec_write.c:328
      [2022/08/06 19:16:24.917] #5 __wt_reconcile (session=session@entry=0x35cff100, ref=ref@entry=0x549e46e0, salvage=salvage@entry=0x0, flags=flags@entry=96) at ../src/reconcile/rec_write.c:98
      [2022/08/06 19:16:24.918] #6 0x0000ffff81c46fa8 in __evict_review (inmem_splitp=<synthetic pointer>, evict_flags=0, ref=0x549e46e0, session=0x35cff100) at ../src/evict/evict_page.c:731
      [2022/08/06 19:16:24.925] #7 __wt_evict (session=session@entry=0x35cff100, ref=ref@entry=0x549e46e0, previous_state=previous_state@entry=3 '\003', flags=flags@entry=0) at ../src/evict/evict_page.c:168
      [2022/08/06 19:16:24.927] #8 0x0000ffff81c3ee90 in __evict_page (session=session@entry=0x35cff100, is_server=is_server@entry=false) at ../src/evict/evict_lru.c:2334
      [2022/08/06 19:16:24.927] #9 0x0000ffff81c3f600 in __evict_lru_pages (session=session@entry=0x35cff100, is_server=is_server@entry=false) at ../src/evict/evict_lru.c:1150
      [2022/08/06 19:16:24.932] #10 0x0000ffff81c450f4 in __wt_evict_thread_run (session=0x35cff100, thread=0x35c90af0) at ../src/evict/evict_lru.c:320
      [2022/08/06 19:16:24.937] #11 0x0000ffff81cff5b4 in __thread_run (arg=0x35c90af0) at ../src/support/thread_group.c:31
      [2022/08/06 19:16:24.937] #12 0x0000ffff81ab3624 in start_thread (arg=0xffff81cff564 <__thread_run>) at pthread_create.c:477
      [2022/08/06 19:16:24.937] #13 0x0000ffff819ba49c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
      [2022/08/06 19:16:25.334] 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
      [2022/08/06 19:16:25.334] 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
      [2022/08/06 19:16:25.611] BFD: warning: /data/mci/626bd9673ee1cabde887d60cdb65878f/wiredtiger/cmake_build/test/format/./dump_t.130642.core is truncated: expected core file size >= 1334706176, found: 516796416
      [2022/08/06 19:16:25.611] ----------------------
      [2022/08/06 19:16:25.611] | ./dump_t.130642.core |
      [2022/08/06 19:16:25.611] ----------------------
      [2022/08/06 19:16:25.611] [New LWP 132940]
      [2022/08/06 19:16:25.611] [New LWP 132939]
      [2022/08/06 19:16:25.611] [New LWP 130642]
      [2022/08/06 19:16:25.611] [New LWP 130688]
      [2022/08/06 19:16:25.611] [New LWP 130692]
      [2022/08/06 19:16:25.611] [New LWP 130695]
      [2022/08/06 19:16:25.611] [New LWP 130694]
      [2022/08/06 19:16:25.611] [New LWP 132938]
      [2022/08/06 19:16:25.611] [New LWP 132942]
      [2022/08/06 19:16:25.611] [New LWP 132941]
      [2022/08/06 19:16:25.611] [New LWP 130693]
      [2022/08/06 19:16:25.611] Cannot access memory at address 0xffffb6cde1e0
      [2022/08/06 19:16:25.611] Cannot access memory at address 0xffffb6cde1d8
      [2022/08/06 19:16:25.612] Core was generated by `./t -c /data/mci/626bd9673ee1cabde887d60cdb65878f/wiredtiger/cmake_build/test/f'.
      [2022/08/06 19:16:25.612] Program terminated with signal SIGABRT, Aborted.
      [2022/08/06 19:16:25.612] #0 0x0000ffffb65d7d78 in ?? ()
      [2022/08/06 19:16:25.612] Failed to read a valid object file image from memory.
      [2022/08/06 19:16:25.612] Cannot access memory at address 0xffffb6cde1e0
      [2022/08/06 19:16:25.612] [Current thread is 1 (LWP 132940)]
      [2022/08/06 19:16:25.612] #0 0x0000ffffb65d7d78 in ?? ()
      [2022/08/06 19:16:25.612] #1 0x0000ffffb1161f18 in ?? ()
      [2022/08/06 19:16:25.681] Cannot access memory at address 0xffffb6cde1d8
      [2022/08/06 19:16:25.681] BFD: warning: /data/mci/626bd9673ee1cabde887d60cdb65878f/wiredtiger/cmake_build/test/format/./dump_t.130642.core is truncated: expected core file size >= 1334706176, found: 516796416
      [2022/08/06 19:16:25.681] Cannot access memory at address 0xffffb6cde1e0
      [2022/08/06 19:16:25.681] Cannot access memory at address 0xffffb6cde1d8
      [2022/08/06 19:16:25.681] Failed to read a valid object file image from memory.
      [2022/08/06 19:16:25.681] Cannot access memory at address 0xffffb6cde1e0
      [2022/08/06 19:16:25.688] Backtrace stopped: previous frame identical to this frame (corrupt stack?)
      [2022/08/06 19:16:25.688] ----------------------
      [2022/08/06 19:16:25.688] | ./dump_t.131761.core |
      [2022/08/06 19:16:25.688] ----------------------
      [2022/08/06 19:16:25.748] Cannot access memory at address 0xffffb6cde1d8
      [2022/08/06 19:16:25.748] BFD: warning: /data/mci/626bd9673ee1cabde887d60cdb65878f/wiredtiger/cmake_build/test/format/./dump_t.131761.core is truncated: expected core file size >= 512172032, found: 317800448

       

      The issue is the format.sh script does not correctly resolve its processes using kill for the process group where it should do it for the process and calling wait when its not a sub process. Fixing this should prevent this additional cores which are false issues.

            Assignee:
            mick.graham@mongodb.com Mick Graham
            Reporter:
            mick.graham@mongodb.com Mick Graham
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: