Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-1919

Segfault in reconciliation during checkpoint

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.6.1
    • Labels:
      None
    • # Replies:
      10
    • Last comment by Customer:
      true

      Description

      Stress test
      http://build.wiredtiger.com:8080/job/wiredtiger-test-format-stress/6837

      failed on bengal with

      Backtrace:
      /home/jenkins/jenkins/workspace/wiredtiger-test-format-stress/build_posix/../src/reconcile/rec_write.c:4260(__rec_row_merge)[0x457c46]
      /home/jenkins/jenkins/workspace/wiredtiger-test-format-stress/build_posix/../src/reconcile/rec_write.c:4146(__rec_row_int)[0x4576dc]
      /home/jenkins/jenkins/workspace/wiredtiger-test-format-stress/build_posix/../src/reconcile/rec_write.c:409(__wt_reconcile)[0x44fe9d]
      /home/jenkins/jenkins/workspace/wiredtiger-test-format-stress/build_posix/../src/btree/bt_sync.c:165(__sync_file)[0x49ed19]
      /home/jenkins/jenkins/workspace/wiredtiger-test-format-stress/build_posix/../src/btree/bt_sync.c:256(__wt_cache_op)[0x49ef59]
      /home/jenkins/jenkins/workspace/wiredtiger-test-format-stress/build_posix/../src/txn/txn_ckpt.c:981(__checkpoint_worker)[0x47863d]
      /home/jenkins/jenkins/workspace/wiredtiger-test-format-stress/build_posix/../src/txn/txn_ckpt.c:1063(__wt_checkpoint)[0x478952]
      /home/jenkins/jenkins/workspace/wiredtiger-test-format-stress/build_posix/../src/txn/txn_ckpt.c:184(__checkpoint_apply)[0x4762f4]
      /home/jenkins/jenkins/workspace/wiredtiger-test-format-stress/build_posix/../src/txn/txn_ckpt.c:464(__wt_txn_checkpoint)[0x47736b]
      /home/jenkins/jenkins/workspace/wiredtiger-test-format-stress/build_posix/../src/session/session_api.c:912(__session_checkpoint)[0x469dee]
      /home/jenkins/jenkins/workspace/wiredtiger-test-format-stress/build_posix/test/format/../../../test/format/ops.c:351(ops)[0x40f691]
      

      CONFIG:

      ############################################
      #  RUN PARAMETERS
      ############################################
      abort=0
      auto_throttle=1
      firstfit=0
      bitcnt=1
      bloom=1
      bloom_bit_count=14
      bloom_hash_count=15
      bloom_oldest=0
      cache=63
      checkpoints=1
      checksum=off
      chunk_size=4
      compaction=0
      compression=lz4
      data_extend=0
      data_source=table
      delete_pct=20
      dictionary=0
      evict_max=0
      file_type=row-store
      backups=0
      huffman_key=0
      huffman_value=0
      insert_pct=20
      internal_key_truncation=1
      internal_page_max=15
      isolation=random
      key_gap=15
      key_max=98
      key_min=10
      leak_memory=0
      leaf_page_max=10
      logging=0
      logging_archive=0
      logging_prealloc=0
      lsm_worker_threads=4
      merge_max=12
      mmap=1
      ops=100000
      prefix_compression=1
      prefix_compression_min=8
      repeat_data_pct=38
      reverse=0
      rows=100000
      runs=100
      split_pct=53
      statistics=0
      statistics_server=0
      threads=10
      timer=20
      value_max=3618
      value_min=7
      wiredtiger_config=
      write_pct=13
      ############################################
      

        Issue Links

          Activity

          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: Don't skip reconciliation of pages during checkpoint if they were rewritten in memory.

          refs WT-1919
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/52b873402a6c9fa3dd1a1bbee8671d052ecfb965

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: Don't skip reconciliation of pages during checkpoint if they were rewritten in memory. refs WT-1919 Branch: develop https://github.com/wiredtiger/wiredtiger/commit/52b873402a6c9fa3dd1a1bbee8671d052ecfb965
          Hide
          keith.bostic Keith Bostic added a comment -

          [Michael Cahill, staring at this code:

          /*
           * Write dirty pages, unless we can be sure they only
           * became dirty after the checkpoint started.
           *
           * We can skip dirty pages if:
           * (1) they are leaf pages;
           * (2) there is a snapshot transaction active (which
           *     is the case in ordinary application checkpoints
           *     but not all internal cases); and
           * (3) the first dirty update on the page is
           *     sufficiently recent that the checkpoint
           *     transaction would skip them.
           *
           * Mark the tree dirty: the checkpoint marked it clean
           * and we can't skip future checkpoints until this page
           * is written.
           */
          if (!WT_PAGE_IS_INTERNAL(page) &&
              F_ISSET(txn, WT_TXN_HAS_SNAPSHOT) &&
              WT_TXNID_LT(txn->snap_max, mod->first_dirty_txn) &&
              !F_ISSET(mod, WT_PM_REC_REWRITE)) {
                  __wt_page_modify_set(session, page);
                  continue;
          }
          

          I think I understand case #4, why pages with WT_PM_REC_REWRITE are skipped, but maybe it's worth adding (4) to the comment? I don't think it's obvious.

          Show
          keith.bostic Keith Bostic added a comment - [ Michael Cahill , staring at this code: /* * Write dirty pages, unless we can be sure they only * became dirty after the checkpoint started. * * We can skip dirty pages if: * (1) they are leaf pages; * (2) there is a snapshot transaction active (which * is the case in ordinary application checkpoints * but not all internal cases); and * (3) the first dirty update on the page is * sufficiently recent that the checkpoint * transaction would skip them. * * Mark the tree dirty: the checkpoint marked it clean * and we can't skip future checkpoints until this page * is written. */ if (!WT_PAGE_IS_INTERNAL(page) && F_ISSET(txn, WT_TXN_HAS_SNAPSHOT) && WT_TXNID_LT(txn->snap_max, mod->first_dirty_txn) && !F_ISSET(mod, WT_PM_REC_REWRITE)) { __wt_page_modify_set(session, page); continue; } I think I understand case #4, why pages with WT_PM_REC_REWRITE are skipped, but maybe it's worth adding (4) to the comment? I don't think it's obvious.
          Hide
          keith.bostic Keith Bostic added a comment -

          On second thought... maybe I don't understand. The code that rewrites the page in memory (split_multi_inmem), has this:

          /*
           * We modified the page above, which will have set the first dirty
           * transaction to the last transaction currently running.  However, the
           * updates we installed may be older than that.  Set the first dirty
           * transaction to an impossibly old value so this page is never skipped
           * in a checkpoint.
           */
          page->modify->first_dirty_txn = WT_TXN_FIRST;
          

          why isn't that sufficient?

          Show
          keith.bostic Keith Bostic added a comment - On second thought... maybe I don't understand. The code that rewrites the page in memory ( split_multi_inmem ), has this: /* * We modified the page above, which will have set the first dirty * transaction to the last transaction currently running. However, the * updates we installed may be older than that. Set the first dirty * transaction to an impossibly old value so this page is never skipped * in a checkpoint. */ page->modify->first_dirty_txn = WT_TXN_FIRST; why isn't that sufficient?
          Hide
          keith.bostic Keith Bostic added a comment -

          Here's where I am on this:

          WT_PM_REC_REWRITE is a short-lived page state set in one specific path, when forced eviction of a leaf page fails because there's nothing much to write. The fact it must be a leaf page during eviction is because it is only set in a reconciliation call where WT_EVICT_UPDATE_RESTORE was set. It's short-lived because before we give up the exclusive lock on the page, we create a new page and swap it into place, and discard the page where WT_PM_REC_REWRITE was set.

          For these reasons...

          sync_file (the WT_SYNC_CHECKPOINT operation) should not be checking for WT_PM_REC_REWRITE, it should never see that flag as the checkpoint will be blocked from the leaf page until the eviction is complete.

          rec_root_write should not be checking for WT_PM_REC_REWRITE, it should never see that flag because it's writing internal pages, not leaf pages (and it's not evicting, anyway).

          rec_col_int should not be checking for WT_PM_REC_REWRITE, because any write of a page's column-store internal page parent will be blocked from evaluating the leaf child until the eviction is complete.

          rec_write_wrapup should not be checking for WT_PM_REC_REWRITE, because any previous reconciliation that set WT_PM_REC_REWRITE will have cleared that flag before this reconciliation could proceed.

          rec_write_wrapup_err should not be checking for WT_PM_REC_REWRITE because it's only set when a reconciliation is already considered successful.

          This pretty much removes the need for the WT_PM_REC_REWRITE state – the additional error handling makes me think we'd be better off getting rid of it and using WT_PM_REC_MULTIBLOCK with a single test, if we ended up with 1 block or several blocks.

          Show
          keith.bostic Keith Bostic added a comment - Here's where I am on this: WT_PM_REC_REWRITE is a short-lived page state set in one specific path, when forced eviction of a leaf page fails because there's nothing much to write. The fact it must be a leaf page during eviction is because it is only set in a reconciliation call where WT_EVICT_UPDATE_RESTORE was set. It's short-lived because before we give up the exclusive lock on the page, we create a new page and swap it into place, and discard the page where WT_PM_REC_REWRITE was set. For these reasons... sync_file (the WT_SYNC_CHECKPOINT operation) should not be checking for WT_PM_REC_REWRITE , it should never see that flag as the checkpoint will be blocked from the leaf page until the eviction is complete. rec_root_write should not be checking for WT_PM_REC_REWRITE , it should never see that flag because it's writing internal pages, not leaf pages (and it's not evicting, anyway). rec_col_int should not be checking for WT_PM_REC_REWRITE , because any write of a page's column-store internal page parent will be blocked from evaluating the leaf child until the eviction is complete. rec_write_wrapup should not be checking for WT_PM_REC_REWRITE , because any previous reconciliation that set WT_PM_REC_REWRITE will have cleared that flag before this reconciliation could proceed. rec_write_wrapup_err should not be checking for WT_PM_REC_REWRITE because it's only set when a reconciliation is already considered successful. This pretty much removes the need for the WT_PM_REC_REWRITE state – the additional error handling makes me think we'd be better off getting rid of it and using WT_PM_REC_MULTIBLOCK with a single test, if we ended up with 1 block or several blocks.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith@wiredtiger.com'}

          Message: WT-1919, WT-2108: The code to create the information for an rewrite of
          a single block in memory wasn't setting the multiblock key; should its
          parent page somehow be written before the leaf page is rewritten and
          attempt to merge the child page's information into the parent, we could
          drop core. I still don't see the path where that's possible in the
          current system, but it's the right thing to do regardless, the block
          should always be consistent.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/cb8e834c550da6ca45b0551c8d212194ef95adee

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith@wiredtiger.com'} Message: WT-1919 , WT-2108 : The code to create the information for an rewrite of a single block in memory wasn't setting the multiblock key; should its parent page somehow be written before the leaf page is rewritten and attempt to merge the child page's information into the parent, we could drop core. I still don't see the path where that's possible in the current system, but it's the right thing to do regardless, the block should always be consistent. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/cb8e834c550da6ca45b0551c8d212194ef95adee

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                1 year, 32 weeks, 5 days ago
                Date of 1st Reply: