Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-1892

wt command detecting recovery is needed

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.6.0
    • Labels:
      None
    • # Replies:
      6
    • Last comment by Customer:
      true

      Description

      test/format has been failing running the wt command, detecting that recovery needs to be run. Something has recently changed. Here's one example:
      http://build.wiredtiger.com:8080/job/wiredtiger-test-format-stress/6690

      The error message:

      lt-wt: WT_RUN_RECOVERY: recovery must be run to continue
      t: standard: dump comparison failed: Unknown error 256
      

      and here's the CONFIG:

      ############################################
      #  RUN PARAMETERS
      ############################################
      abort=0
      auto_throttle=1
      firstfit=0
      bitcnt=5
      bloom=1
      bloom_bit_count=58
      bloom_hash_count=21
      bloom_oldest=1
      cache=210
      checkpoints=1
      checksum=uncompressed
      chunk_size=7
      compaction=0
      compression=snappy
      data_extend=0
      data_source=lsm
      delete_pct=0
      dictionary=0
      evict_max=1
      file_type=row-store
      backups=1
      huffman_key=0
      huffman_value=0
      insert_pct=84
      internal_key_truncation=1
      internal_page_max=11
      isolation=read-uncommitted
      key_gap=13
      key_max=116
      key_min=12
      leak_memory=0
      leaf_page_max=12
      logging=1
      logging_archive=1
      logging_prealloc=0
      lsm_worker_threads=3
      merge_max=17
      mmap=1
      ops=100000
      prefix_compression=1
      prefix_compression_min=4
      repeat_data_pct=40
      reverse=0
      rows=100000
      runs=100
      split_pct=79
      statistics=0
      statistics_server=0
      threads=1
      timer=20
      value_max=1089
      value_min=1
      wiredtiger_config=
      write_pct=76
      ############################################
      

        Issue Links

          Activity

          Hide
          sue.loverso Sue LoVerso added a comment -

          This took 36 iterations to reproduce. It is related to the merge of my checkpoint lsn changes in https://github.com/wiredtiger/wiredtiger/commit/058679e6f5bb8f113202a464fb4989abb8932c3c.

          Those changes should record the LSN immediately before the checkpoint record is written. We do that because we want an actual LSN not the "next" LSN (which may be at the end of a log file and therefore, never really exist).

          The problem is that turtle file has a checkpoint_lsn:

          ... ,checkpoint_lsn=(2,39777536),...
          

          and a printlog shows this:

            { "lsn" : [2,39777536],
              "hdr_flags" : "",
              "rec_len" : 128,
              "mem_len" : 128,
              "type" : "file_sync",
              "fileid" : 0,
              "start" : 4
            },
            { "lsn" : [2,39777664],
              "hdr_flags" : "",
              "rec_len" : 128,
              "mem_len" : 128,
              "type" : "file_sync",
              "fileid" : 0,
              "start" : 0
            },
            { "lsn" : [2,39777792],
              "hdr_flags" : "",
              "rec_len" : 128,
              "mem_len" : 128,
              "type" : "checkpoint",
              "ckpt_lsn" : [2,39777536]
            },
          [END OF LOG]
          

          Show
          sue.loverso Sue LoVerso added a comment - This took 36 iterations to reproduce. It is related to the merge of my checkpoint lsn changes in https://github.com/wiredtiger/wiredtiger/commit/058679e6f5bb8f113202a464fb4989abb8932c3c . Those changes should record the LSN immediately before the checkpoint record is written. We do that because we want an actual LSN not the "next" LSN (which may be at the end of a log file and therefore, never really exist). The problem is that turtle file has a checkpoint_lsn: ... ,checkpoint_lsn=(2,39777536),... and a printlog shows this: { "lsn" : [2,39777536], "hdr_flags" : "", "rec_len" : 128, "mem_len" : 128, "type" : "file_sync", "fileid" : 0, "start" : 4 }, { "lsn" : [2,39777664], "hdr_flags" : "", "rec_len" : 128, "mem_len" : 128, "type" : "file_sync", "fileid" : 0, "start" : 0 }, { "lsn" : [2,39777792], "hdr_flags" : "", "rec_len" : 128, "mem_len" : 128, "type" : "checkpoint", "ckpt_lsn" : [2,39777536] }, [END OF LOG]
          Hide
          sue.loverso Sue LoVerso added a comment -

          Michael and I discussed this and the new rule for determining if recovery can be skipped or not is too limited. There can be races where log records get written on a clean shutdown that throws off that simple determination. I am testing a fix now.

          Show
          sue.loverso Sue LoVerso added a comment - Michael and I discussed this and the new rule for determining if recovery can be skipped or not is too limited. There can be races where log records get written on a clean shutdown that throws off that simple determination. I am testing a fix now.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'sueloverso', u'name': u'Susan LoVerso', u'email': u'sue@wiredtiger.com'}

          Message: Look for any number of non-data-changing log records to determine if we
          can skip recovery. WT-1892
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/abb0bb80cc6dce29b8db61c6747c228c2701ae5a

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'sueloverso', u'name': u'Susan LoVerso', u'email': u'sue@wiredtiger.com'} Message: Look for any number of non-data-changing log records to determine if we can skip recovery. WT-1892 Branch: develop https://github.com/wiredtiger/wiredtiger/commit/abb0bb80cc6dce29b8db61c6747c228c2701ae5a
          Hide
          sue.loverso Sue LoVerso added a comment -

          Michael Cahill This is fixed in develop. Please review the change. It is a small change and I wanted to get the Jenkins tests working again.

          Show
          sue.loverso Sue LoVerso added a comment - Michael Cahill This is fixed in develop. Please review the change. It is a small change and I wanted to get the Jenkins tests working again.
          Hide
          michael.cahill Michael Cahill added a comment -

          Thanks Sue LoVerso, lgtm.

          Show
          michael.cahill Michael Cahill added a comment - Thanks Sue LoVerso , lgtm.
          Hide
          sue.loverso Sue LoVerso added a comment -

          With the fix, I believe this one is ready to resolve/close.

          Show
          sue.loverso Sue LoVerso added a comment - With the fix, I believe this one is ready to resolve/close.

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                1 year, 48 weeks ago
                Date of 1st Reply: