Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-2985

Race during checkpoint can cause a core dump

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.9.0, 3.2.11, 3.4.0-rc2
    • Labels:
      None
    • # Replies:
      7
    • Last comment by Customer:
      true
    • Sprint:
      Storage 2016-10-31

      Description

      Running in HAVE_DIAGNOSTIC mode, the failure is as follows:

      file:test00000.wt, WT_SESSION.checkpoint: src/reconcile/rec_write.c, 6061: size != 0 || cell_type == WT_CELL_ADDR_DEL
      file:test00000.wt, WT_SESSION.checkpoint: aborting WiredTiger library
      file:test00000.wt, WT_SESSION.checkpoint: process ID 23234: waiting for debugger...
       
      (gdb) where
      #0  0x00007fc5e378e3c3 in select () from /lib64/libc.so.6
      #1  0x00000000004526bc in __wt_sleep (seconds=10, micro_seconds=0) at src/os_posix/os_sleep.c:23
      #2  0x00000000004897dc in __wt_attach (session=0x10bcb40) at src/support/global.c:125
      #3  0x000000000051e3e2 in __wt_abort (session=0x10bcb40) at src/os_common/os_abort.c:22
      #4  0x000000000048933d in __wt_assert (session=0x10bcb40, error=0, file_name=0x561479 "src/reconcile/rec_write.c", line_number=6061, fmt=0x56126a "%s") at src/support/err.c:482
      #5  0x000000000046ac42 in __rec_cell_build_addr (session=0x10bcb40, r=0x7fc0e43b9920, addr=0x0, size=0, cell_type=48, recno=0) at src/reconcile/rec_write.c:6061
      #6  0x00000000004685ef in __rec_row_merge (session=0x10bcb40, r=0x7fc0e43b9920, page=0x7fc57d0b8b70) at src/reconcile/rec_write.c:4983
      #7  0x0000000000468014 in __rec_row_int (session=0x10bcb40, r=0x7fc0e43b9920, page=0x7fc5aceef890) at src/reconcile/rec_write.c:4857
      #8  0x000000000045f79a in __wt_reconcile (session=0x10bcb40, ref=0x7fc5c404c2a0, salvage=0x0, flags=1) at src/reconcile/rec_write.c:415
      #9  0x00000000004d04d8 in __sync_file (session=0x10bcb40, syncop=WT_SYNC_CHECKPOINT) at src/btree/bt_sync.c:197
      #10 0x00000000004d06c4 in __wt_cache_op (session=0x10bcb40, op=WT_SYNC_CHECKPOINT) at src/btree/bt_sync.c:277
      #11 0x000000000049c170 in __checkpoint_tree (session=0x10bcb40, is_checkpoint=true, cfg=0x7fc5e117fe70) at src/txn/txn_ckpt.c:1390
      #12 0x000000000049c432 in __checkpoint_tree_helper (session=0x10bcb40, cfg=0x7fc5e117fe70) at src/txn/txn_ckpt.c:1497
      #13 0x0000000000498426 in __checkpoint_apply (session=0x10bcb40, cfg=0x7fc5e117fe70, op=0x49c3f3 <__checkpoint_tree_helper>) at src/txn/txn_ckpt.c:190
      #14 0x000000000049a691 in __txn_checkpoint (session=0x10bcb40, cfg=0x7fc5e117fe70) at src/txn/txn_ckpt.c:731
      #15 0x000000000049b2ae in __wt_txn_checkpoint (session=0x10bcb40, cfg=0x7fc5e117fe70) at src/txn/txn_ckpt.c:916
      #16 0x0000000000483e25 in __session_checkpoint (wt_session=0x10bcb40, config=0x0) at src/session/session_api.c:1579
      

      The problem is we're building an address that will reference a leaf page, but the leaf page has a disk image instead of an address.

      To reproduce the problem, apply this diff:

      diff --git a/bench/wtperf/runners/500m-btree-50r50u.wtperf b/bench/wtperf/runners/500m-btree-50r50u.wtperf
      index 06745bf..452ce43 100644
      --- a/bench/wtperf/runners/500m-btree-50r50u.wtperf
      +++ b/bench/wtperf/runners/500m-btree-50r50u.wtperf
      @@ -7,7 +7,7 @@
       # checkpoints.  Collect wiredtiger stats for ftdc.
       conn_config="cache_size=16G,checkpoint=(wait=60,log_size=2GB),session_max=20000,log=(enabled),statistics=(fast),statistics_log=(wait=30,json),eviction=(threads_max=4)"
       create=false
      -compression="snappy"
      +compression="zlib"
       sess_config="isolation=snapshot"
       table_count=2
       key_sz=40
      diff --git a/bench/wtperf/runners/500m-btree-populate.wtperf b/bench/wtperf/runners/500m-btree-populate.wtperf
      index f9aed09..170ea0e 100644
      --- a/bench/wtperf/runners/500m-btree-populate.wtperf
      +++ b/bench/wtperf/runners/500m-btree-populate.wtperf
      @@ -11,7 +11,7 @@
       # well and be small on disk.
       conn_config="cache_size=16G,checkpoint=(wait=60,log_size=2GB),session_max=20000,log=(enabled),statistics=(fast),statistics_log=(wait=30,json),eviction=(threads_max=4)"
       compact=true
      -compression="snappy"
      +compression="zlib"
       sess_config="isolation=snapshot"
       table_config="internal_page_max=16K,type=file,leaf_page_max=16K,memory_page_max=10M,split_pct=90"
       table_count=2
      

      Then run 500m-btree-populate.wtperf followed by 500m-btree-50r50u.wtperf. It hits pretty reliably for me, but can take anywhere from 3-4 runs, and 30 minutes to an hour for each run.

        Issue Links

          Activity

          Hide
          keith.bostic Keith Bostic added a comment -

          The problem is a page left in a state post-eviction that precludes checkpoint writing it, and then checkpoint attempts to write it.

          Here are the steps:

          1. try to evict a page with update-restore and have reconciliation leave it with a disk image
          2. when reconciliation returns, we attempt to resolve the split, but then have the split resolution fail because we can’t get the parent page lock
          3. skip re-reconciling the page in the checkpoint code because it passes this test in checkpoint:

          if (!WT_PAGE_IS_INTERNAL(page) &&
              F_ISSET(txn, WT_TXN_HAS_SNAPSHOT) &&
              WT_TXNID_LT(txn->snap_max, mod->first_dirty_txn)) {
                  __wt_page_modify_set(session, page);
                  continue;
          }
          

          4. the page is then left in its post-reconciliation state, and checkpoint will attempt to merge the addresses (which it doesn't have), into its parent leaf page.

          Show
          keith.bostic Keith Bostic added a comment - The problem is a page left in a state post-eviction that precludes checkpoint writing it, and then checkpoint attempts to write it. Here are the steps: 1. try to evict a page with update-restore and have reconciliation leave it with a disk image 2. when reconciliation returns, we attempt to resolve the split, but then have the split resolution fail because we can’t get the parent page lock 3. skip re-reconciling the page in the checkpoint code because it passes this test in checkpoint: if (!WT_PAGE_IS_INTERNAL(page) && F_ISSET(txn, WT_TXN_HAS_SNAPSHOT) && WT_TXNID_LT(txn->snap_max, mod->first_dirty_txn)) { __wt_page_modify_set(session, page); continue; } 4. the page is then left in its post-reconciliation state, and checkpoint will attempt to merge the addresses (which it doesn't have), into its parent leaf page.
          Hide
          keith.bostic Keith Bostic added a comment -

          Alexander Gorrod, Michael Cahill: I'd been working this problem in my Zstd branch because I never saw it fire in any other branch. I think I just got lucky in my original testing and I've now reproduced the problem in develop, so I'm opening a ticket for it.

          Show
          keith.bostic Keith Bostic added a comment - Alexander Gorrod , Michael Cahill : I'd been working this problem in my Zstd branch because I never saw it fire in any other branch. I think I just got lucky in my original testing and I've now reproduced the problem in develop, so I'm opening a ticket for it.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'}

          Message: WT-2985 checkpoint core dump (#3100)

          • Checkpoint must not skip writing a leaf page that's never been written before.
          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'} Message: WT-2985 checkpoint core dump (#3100) Checkpoint must not skip writing a leaf page that's never been written before. Split out the code to make a tree dirty: checkpoint needs it, and it's relatively expensive to dirty a page. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/ceeb57b565fca6ade4bb02d8cf62095374743bd1
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'}

          Message: WT-2985 checkpoint core dump (#3100)

          • Checkpoint must not skip writing a leaf page that's never been written before.
          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'} Message: WT-2985 checkpoint core dump (#3100) Checkpoint must not skip writing a leaf page that's never been written before. Split out the code to make a tree dirty: checkpoint needs it, and it's relatively expensive to dirty a page. Branch: mongodb-3.4 https://github.com/wiredtiger/wiredtiger/commit/ceeb57b565fca6ade4bb02d8cf62095374743bd1
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

          Message: Import wiredtiger: ef9a7983ea47cea78400a4472a3d4e46735385c5 from branch mongodb-3.4

          ref: 6a31c2118c..ef9a7983ea
          for: 3.4.0-rc2

          WT-1592 Add ability to dump detailed cache information via statistics
          WT-2403 Enhance random cursor implementation for LSM trees
          WT-2880 Add support for Zstandard compression
          WT-2904 Fix a bug where the reported checkpoint size could be many times data size
          WT-2949 Add an option to wtperf to not close connection on shutdown
          WT-2954 Inserting multi-megabyte values can cause large in-memory pages
          WT-2955 Add statistics tracking the amount of time threads spend waiting for high level locks
          WT-2956 utility tests -h option is always overridden by the default setup
          WT-2959 Ensure WT_SESSION_IMPL is never used before it's initialized
          WT-2963 Race setting max_entries during eviction
          WT-2965 test_wt2323_join_visibility can hang on OSX
          WT-2974 lint
          WT-2976 Add a statistic tracking how long application threads spend doing I/O
          WT-2977 Csuite LSM Random test can occasionally fail
          WT-2985 Race during checkpoint can cause a core dump
          WT-2987 Fix a bug where opening a cursor on an incomplete table drops core
          WT-2988 Fix a bug where __wt_epoch potentially returns garbage values.
          Branch: master
          https://github.com/mongodb/mongo/commit/0609d0ce2ef563d7a4cde77d46396fe5c92c6df1

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'} Message: Import wiredtiger: ef9a7983ea47cea78400a4472a3d4e46735385c5 from branch mongodb-3.4 ref: 6a31c2118c..ef9a7983ea for: 3.4.0-rc2 WT-1592 Add ability to dump detailed cache information via statistics WT-2403 Enhance random cursor implementation for LSM trees WT-2880 Add support for Zstandard compression WT-2904 Fix a bug where the reported checkpoint size could be many times data size WT-2949 Add an option to wtperf to not close connection on shutdown WT-2954 Inserting multi-megabyte values can cause large in-memory pages WT-2955 Add statistics tracking the amount of time threads spend waiting for high level locks WT-2956 utility tests -h option is always overridden by the default setup WT-2959 Ensure WT_SESSION_IMPL is never used before it's initialized WT-2963 Race setting max_entries during eviction WT-2965 test_wt2323_join_visibility can hang on OSX WT-2974 lint WT-2976 Add a statistic tracking how long application threads spend doing I/O WT-2977 Csuite LSM Random test can occasionally fail WT-2985 Race during checkpoint can cause a core dump WT-2987 Fix a bug where opening a cursor on an incomplete table drops core WT-2988 Fix a bug where __wt_epoch potentially returns garbage values. Branch: master https://github.com/mongodb/mongo/commit/0609d0ce2ef563d7a4cde77d46396fe5c92c6df1
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'}

          Message: WT-2985 checkpoint core dump (#3100)

          • Checkpoint must not skip writing a leaf page that's never been written before.
          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'} Message: WT-2985 checkpoint core dump (#3100) Checkpoint must not skip writing a leaf page that's never been written before. Split out the code to make a tree dirty: checkpoint needs it, and it's relatively expensive to dirty a page. Branch: mongodb-3.2 https://github.com/wiredtiger/wiredtiger/commit/ceeb57b565fca6ade4bb02d8cf62095374743bd1
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

          Message: Import wiredtiger: b11ed312cedb905dec49dd2c9c262fabf64d13cd from branch mongodb-3.2

          ref: 9cf2f89d6d..b11ed312ce
          for: 3.2.11

          WT-1592 Dump detailed cache information via statistics
          WT-2403 Enhance random cursor implementation for LSM trees
          WT-2831 Skip creating a checkpoint if there have been no changes
          WT-2858 rename wtperf's CONFIG structure
          WT-2880 Add support for Zstandard compression
          WT-2895 Reduce the runtime of make check testing with disable long
          WT-2904 Fix a bug where the reported checkpoint size could be many times data size
          WT-2907 Bug in Java ConcurrentCloseTest case
          WT-2917 split wtperf's configuration into per-database and per-run parts
          WT-2920 Add statistic tracking application thread cache maintenance time
          WT-2931 Configure default in-memory dirty cache usage lower
          WT-2932 Allow applications to selectively ignore cache limit with in-memory configuration
          WT-2933 Fix a race between named snapshots and checkpoints
          WT-2937 test_inmem01 aborts due to stuck cache
          WT-2938 Assembly files should end in .sx, not .S
          WT-2941 Improve test/format to use faster key-generation functions
          WT-2942 verbose strings don't need newline
          WT-2946 dist/s_docs incompatible with OS X Xcode installation
          WT-2948 simplify error handling by making epoch time return never fail
          WT-2949 Add an option to wtperf to not close connection on shutdown
          WT-2950 Inserting multi-megabyte values can cause large in-memory pages
          WT-2954 Inserting multi-megabyte values can cause large in-memory pages
          WT-2955 Add statistics tracking the amount of time threads spend waiting for high level locks
          WT-2956 utility tests -h option is always overridden by the default setup
          WT-2959 Ensure WT_SESSION_IMPL is never used before it's initialized
          WT-2963 Race setting max_entries during eviction
          WT-2965 test_wt2323_join_visibility can hang on OSX
          WT-2974 lint
          WT-2976 Add a statistic tracking how long application threads spend doing I/O
          WT-2977 Csuite LSM Random test can occasionally fail
          WT-2985 Race during checkpoint can cause a core dump
          WT-2987 Fix a bug where opening a cursor on an incomplete table drops core
          WT-2988 __wt_epoch potentially returns garbage values.
          Branch: v3.2
          https://github.com/mongodb/mongo/commit/ebbb4eb0b091fa185b06a060d24b68eb6761ba4a

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'} Message: Import wiredtiger: b11ed312cedb905dec49dd2c9c262fabf64d13cd from branch mongodb-3.2 ref: 9cf2f89d6d..b11ed312ce for: 3.2.11 WT-1592 Dump detailed cache information via statistics WT-2403 Enhance random cursor implementation for LSM trees WT-2831 Skip creating a checkpoint if there have been no changes WT-2858 rename wtperf's CONFIG structure WT-2880 Add support for Zstandard compression WT-2895 Reduce the runtime of make check testing with disable long WT-2904 Fix a bug where the reported checkpoint size could be many times data size WT-2907 Bug in Java ConcurrentCloseTest case WT-2917 split wtperf's configuration into per-database and per-run parts WT-2920 Add statistic tracking application thread cache maintenance time WT-2931 Configure default in-memory dirty cache usage lower WT-2932 Allow applications to selectively ignore cache limit with in-memory configuration WT-2933 Fix a race between named snapshots and checkpoints WT-2937 test_inmem01 aborts due to stuck cache WT-2938 Assembly files should end in .sx, not .S WT-2941 Improve test/format to use faster key-generation functions WT-2942 verbose strings don't need newline WT-2946 dist/s_docs incompatible with OS X Xcode installation WT-2948 simplify error handling by making epoch time return never fail WT-2949 Add an option to wtperf to not close connection on shutdown WT-2950 Inserting multi-megabyte values can cause large in-memory pages WT-2954 Inserting multi-megabyte values can cause large in-memory pages WT-2955 Add statistics tracking the amount of time threads spend waiting for high level locks WT-2956 utility tests -h option is always overridden by the default setup WT-2959 Ensure WT_SESSION_IMPL is never used before it's initialized WT-2963 Race setting max_entries during eviction WT-2965 test_wt2323_join_visibility can hang on OSX WT-2974 lint WT-2976 Add a statistic tracking how long application threads spend doing I/O WT-2977 Csuite LSM Random test can occasionally fail WT-2985 Race during checkpoint can cause a core dump WT-2987 Fix a bug where opening a cursor on an incomplete table drops core WT-2988 __wt_epoch potentially returns garbage values. Branch: v3.2 https://github.com/mongodb/mongo/commit/ebbb4eb0b091fa185b06a060d24b68eb6761ba4a

            People

            • Assignee:
              keith.bostic Keith Bostic
              Reporter:
              keith.bostic Keith Bostic
              Participants:
              Last commenter:
              Ramon Fernandez
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                30 weeks, 5 days ago
                Date of 1st Reply:

                  Agile