Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3980

failure returning a modified update without a backing "real" update

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.6.4, 3.7.4, WT3.1.0
    • Affects Version/s: None
    • Component/s: None
    • None

      alexander.gorrod, michael.cahill:

      daniel.gottlieb had a failure today, and I took a look.

      This is in master, commit c5bfa9391f:

      commit c5bfa9391f364ad1f36334d95c487a077aa76cea
      Author: Luke Chen <luke.chen@mongodb.com>
      Date:   Wed Mar 14 16:52:29 2018 +1100
      
          Import wiredtiger: b33708d7d9b2971cda05e71fcba6067b230b97cc from branch mongodb-3.8
      
      Here's the stack:
      
      warning: Source file is more recent than executable.
      892                     *cellp = WT_PAGE_REF_OFFSET(page, ikey->cell_offset);
      [Current thread is 1 (Thread 0x7fe574a90700 (LWP 27446))]
      (gdb) where
      #0  __wt_row_leaf_key_info (ikeyp=0x0, sizep=<synthetic pointer>,
          datap=<synthetic pointer>, cellp=<synthetic pointer>, copy=0x50,
          page=0x7fe5b5027aa0) at src/third_party/wiredtiger/src/include/btree.i:892
      #1  __wt_row_leaf_value_cell (kpack=0x0, rip=0x7fe5b5027af0,
          page=0x7fe5b5027aa0) at src/third_party/wiredtiger/src/include/btree.i:1042
      #2  __value_return (cbt=0x7fe58d7fe8a0, session=0x7fe5b82202e8)
          at src/third_party/wiredtiger/src/btree/bt_ret.c:107
      #3  __wt_value_return_upd (session=session@entry=0x7fe5b82202e8,
          cbt=cbt@entry=0x7fe58d7fe8a0, upd=0x0, upd@entry=0x7fe5b5026520,
          ignore_visibility=ignore_visibility@entry=false)
          at src/third_party/wiredtiger/src/btree/bt_ret.c:241
      #4  0x00007fe5bd879033 in __wt_value_return (
          session=session@entry=0x7fe5b82202e8, cbt=cbt@entry=0x7fe58d7fe8a0,
          upd=upd@entry=0x7fe5b5026520)
          at src/third_party/wiredtiger/src/btree/bt_ret.c:305
      #5  0x00007fe5bd91485d in __cursor_kv_return (upd=0x7fe5b5026520,
          cbt=0x7fe58d7fe8a0, session=0x7fe5b82202e8)
          at src/third_party/wiredtiger/src/include/cursor.i:322
      #6  __wt_btcur_search (cbt=cbt@entry=0x7fe58d7fe8a0)
          at src/third_party/wiredtiger/src/btree/bt_cursor.c:526
      #7  0x00007fe5bd8bce29 in __curfile_search (cursor=0x7fe58d7fe8a0)
          at src/third_party/wiredtiger/src/cursor/cur_file.c:200
      #8  0x00007fe5bd7b6baf in mongo::WiredTigerRecordStoreCursorBase::seekExact (
          this=0x7fe58d7b7740, id=...)
          at src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp:1754
      

      What's happening is WT_CURSOR.search found a match in an in-memory page insert list, and the WT_INSERT had a single update on it that consisted of a valid key and a WT_UPDATE_MODIFY record.

      We didn't find a record to back the modify record in the update list, so we fell back to the page itself, but that page had no entries, and we eventually dropped core because the cursor's slot didn't reference anything useful.

      (gdb) frame 6
      #6  __wt_btcur_search (cbt=cbt@entry=0x7fe58d7fe8a0) at src/third_party/wiredtiger/src/btree/bt_cursor.c:526
      526			ret = __cursor_kv_return(session, cbt, upd);
      (gdb) p cbt.ins
      $67 = (WT_INSERT *) 0x7fe58d087cf0
      (gdb) p *$67
      $68 = {upd = 0x7fe5b5026520, u = {recno = 4294967320, key = {offset = 24, size = 1}}, next = 0x7fe58d087d00}
      (gdb) p *$68.upd
      $69 = {txnid = 2212, timestamp = {val = 6533049679926525967}, next = 0x0, size = 40, type = 2 '\002', 
        state = 0 '\000', data = 0x7fe5b502653e "\001"}
      (gdb) p cbt.ref.page
      $70 = (WT_PAGE *) 0x7fe5b5027aa0
      (gdb) p $70.entries
      $71 = 0
      (gdb) p cbt.slot
      $72 = 0
      

      I suspect the most likely problem is we should have some lookaside records and don't, which could have quite a few explanations: we didn't store the records, we stored them but didn't read them, we tried to read them but they weren't there.

      As far as I can tell, we should not have skipped reading the lookaside records:

      (gdb) p *cbt.ref
      $74 = {page = 0x7fe5b5027aa0, home = 0x7fe58d021b20, pindex_hint = 0, state = 5, addr = 0x0, key = {
          recno = 140623893259104, ikey = 0x7fe58d346360}, page_del = 0x0, page_las = 0x7fe5b24c7bc0}
      (gdb) p *cbt.ref.page_las
      $75 = {las_pageid = 56, las_max_txn = 2212, min_timestamp = {val = 6533049679926525966}, onpage_timestamp = {
          val = 0}, eviction_to_lookaside = false, las_skew_newest = false, invalid = false}
      (gdb) p session.txn
      $76 = {id = 0, isolation = WT_ISO_SNAPSHOT, forced_iso = 0, snap_min = 4160, snap_max = 4160, 
        snapshot = 0x7fe5b4dd7020, snapshot_count = 0, txn_logsync = 8, commit_timestamp = {
          val = 6533049868905086994}, first_commit_timestamp = {val = 6533049868905086994}, prepare_timestamp = {
          val = 0}, read_timestamp = {val = 0}, commit_timestampq = {tqe_next = 0x0, tqe_prev = 0x7fe5bab3fa58}, 
        read_timestampq = {tqe_next = 0x0, tqe_prev = 0x0}, clear_ts_queue = false, mod = 0x7fe5b2353020, 
        mod_alloc = 7680, mod_count = 0, logrec = 0x0, notify = 0x0, ckpt_lsn = {l = {offset = 0, file = 0}, 
          file_offset = 0}, ckpt_nsnapshot = 0, ckpt_snapshot = 0x0, full_ckpt = false, rollback_reason = 0x0, 
        flags = 4104}
      (gdb) p/x $76.flags
      $77 = 0x1008
      

      it's certainly possible we read the lookaside table but didn't find the records we expected, but I can't think of any way to prove that one way or the other at this point.

      I'm putting this into "Needs Triage", but absent any way to proceed, I guess we close it as not-reproducible until we see it again.

            Assignee:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Reporter:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: