Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.9.0, 3.4.0-rc4, 3.2.12
    • Labels:
      None
    • # Replies:
      6
    • Last comment by Customer:
      true
    • Sprint:
      Storage 2016-11-21

      Description

      Test format was found hung on zSeries after 11 hours.

      Attempting x86 repro now:
      CONFIG

      ############################################
      #  RUN PARAMETERS
      ############################################
      abort=0
      auto_throttle=0
      backups=0
      bitcnt=4
      bloom=1
      bloom_bit_count=52
      bloom_hash_count=15
      bloom_oldest=1
      cache=3
      checkpoints=1
      checksum=uncompressed
      chunk_size=1
      compaction=0
      compression=lz4
      data_extend=0
      data_source=table
      delete_pct=6
      dictionary=0
      direct_io=0
      encryption=none
      evict_max=4
      file_type=row-store
      firstfit=0
      huffman_key=0
      huffman_value=0
      in_memory=0
      insert_pct=25
      internal_key_truncation=1
      internal_page_max=10
      isolation=random
      key_gap=5
      key_max=92
      key_min=15
      leaf_page_max=17
      leak_memory=0
      logging=0
      logging_archive=0
      logging_compression=none
      logging_prealloc=0
      long_running_txn=0
      lsm_worker_threads=3
      merge_max=7
      mmap=1
      ops=100000
      prefix_compression=0
      prefix_compression_min=3
      quiet=1
      repeat_data_pct=89
      reverse=0
      rows=100000
      runs=100
      rebalance=1
      salvage=1
      split_pct=76
      statistics=0
      statistics_server=0
      threads=3
      timer=20
      transaction-frequency=24
      value_max=656
      value_min=13
      verify=1
      wiredtiger_config=
      write_pct=68
      ############################################
      

        Activity

        Hide
        david.hows David Hows added a comment -

        I was able to reproduce this on x86 readily.

        With the patch in wt-3023-eviction-tiny-trees I was seeing hangs where the wt.wt tree would be unable to find pages that would pass the test here and the aggressive score would be 0.

        Next questions

        1. What is preventing the oldest ID from moving forward?
        2. Why isn’t this workload getting aggressive?
        Show
        david.hows David Hows added a comment - I was able to reproduce this on x86 readily. With the patch in wt-3023-eviction-tiny-trees I was seeing hangs where the wt.wt tree would be unable to find pages that would pass the test here and the aggressive score would be 0. Next questions What is preventing the oldest ID from moving forward? Why isn’t this workload getting aggressive?
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

        Message: WT-3023 Don't treat splits as eviction making progress. (#3151)

        • We still want to get aggressive and/or "stuck" if all we do is rewrite pages over and over.
        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: WT-3023 Don't treat splits as eviction making progress. (#3151) We still want to get aggressive and/or "stuck" if all we do is rewrite pages over and over. Reset the eviction skip count in tiny trees: we may never have enough pages to hit the target. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/9be507a869760c6adff119e6ea3be9e0e67135dd
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

        Message: WT-3023 Don't treat splits as eviction making progress. (#3151)

        • We still want to get aggressive and/or "stuck" if all we do is rewrite pages over and over.
        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: WT-3023 Don't treat splits as eviction making progress. (#3151) We still want to get aggressive and/or "stuck" if all we do is rewrite pages over and over. Reset the eviction skip count in tiny trees: we may never have enough pages to hit the target. Branch: mongodb-3.4 https://github.com/wiredtiger/wiredtiger/commit/9be507a869760c6adff119e6ea3be9e0e67135dd
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

        Message: Import wiredtiger: ca6eee06ffdacc8e191987e64b3791740dad21e1 from branch mongodb-3.4

        ref: 74430da40c..ca6eee06ff
        for: 3.4.0

        WT-2962 Provide a way to configure builtin extensions
        WT-2984 Search of metadata for recently created collection gets WT_NOTFOUND
        WT-3000 Missing log records in recovery when crashing after a log file switch
        WT-3002 Allow applications to exempt threads from eviction.
        WT-3004 lint: declare functions that don't return a value as void
        WT-3011 __wt_curjoin_open() saves the wrong URI in the cursor.
        WT-3012 Test format hanging on LSM configurations
        WT-3015 Test format stuck with 2mb cache
        WT-3016 Tests needed for systems without ftruncate
        WT-3017 Hazard pointer race with page replace causes error
        WT-3018 lint
        WT-3020 LSM primary changes impact parallel-pop-lsm load time
        WT-3022 LSM operations get stuck in __wt_clsm_await_switch waiting for switch on tree to complete
        WT-3023 Test format hang on zSeries
        WT-3024 wtperf medium-lsm-compact test can hang
        Branch: master
        https://github.com/mongodb/mongo/commit/fb4ae3792065e98696e391ac1c4602216b8502cb

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: Import wiredtiger: ca6eee06ffdacc8e191987e64b3791740dad21e1 from branch mongodb-3.4 ref: 74430da40c..ca6eee06ff for: 3.4.0 WT-2962 Provide a way to configure builtin extensions WT-2984 Search of metadata for recently created collection gets WT_NOTFOUND WT-3000 Missing log records in recovery when crashing after a log file switch WT-3002 Allow applications to exempt threads from eviction. WT-3004 lint: declare functions that don't return a value as void WT-3011 __wt_curjoin_open() saves the wrong URI in the cursor. WT-3012 Test format hanging on LSM configurations WT-3015 Test format stuck with 2mb cache WT-3016 Tests needed for systems without ftruncate WT-3017 Hazard pointer race with page replace causes error WT-3018 lint WT-3020 LSM primary changes impact parallel-pop-lsm load time WT-3022 LSM operations get stuck in __wt_clsm_await_switch waiting for switch on tree to complete WT-3023 Test format hang on zSeries WT-3024 wtperf medium-lsm-compact test can hang Branch: master https://github.com/mongodb/mongo/commit/fb4ae3792065e98696e391ac1c4602216b8502cb
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

        Message: WT-3023 Don't treat splits as eviction making progress. (#3151)

        • We still want to get aggressive and/or "stuck" if all we do is rewrite pages over and over.
        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: WT-3023 Don't treat splits as eviction making progress. (#3151) We still want to get aggressive and/or "stuck" if all we do is rewrite pages over and over. Reset the eviction skip count in tiny trees: we may never have enough pages to hit the target. Branch: mongodb-3.2 https://github.com/wiredtiger/wiredtiger/commit/9be507a869760c6adff119e6ea3be9e0e67135dd
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

        Message: Import wiredtiger: 040e3d6f764c0fb626cb47fede54469f57d0c6e0 from branch mongodb-3.2

        ref: 187707a5c1..040e3d6f76
        for: 3.2.12

        WT-2962 Provide a way to configure builtin extensions
        WT-2984 Search of metadata for recently created collection gets WT_NOTFOUND
        WT-3000 Missing log records in recovery when crashing after a log file switch
        WT-3002 Allow applications to exempt threads from eviction.
        WT-3004 lint: declare functions that don't return a value as void
        WT-3011 __wt_curjoin_open() saves the wrong URI in the cursor.
        WT-3012 Test format hanging on LSM configurations
        WT-3015 Test format stuck with 2mb cache
        WT-3016 Tests needed for systems without ftruncate
        WT-3017 Hazard pointer race with page replace causes error
        WT-3018 lint
        WT-3020 LSM primary changes impact parallel-pop-lsm load time
        WT-3022 LSM operations get stuck in __wt_clsm_await_switch waiting for switch on tree to complete
        WT-3023 Test format hang on zSeries
        WT-3024 wtperf medium-lsm-compact test can hang
        Branch: v3.2
        https://github.com/mongodb/mongo/commit/c586934f7212f6a9a2087cbaf9a8fcd7d7ce9abf

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: Import wiredtiger: 040e3d6f764c0fb626cb47fede54469f57d0c6e0 from branch mongodb-3.2 ref: 187707a5c1..040e3d6f76 for: 3.2.12 WT-2962 Provide a way to configure builtin extensions WT-2984 Search of metadata for recently created collection gets WT_NOTFOUND WT-3000 Missing log records in recovery when crashing after a log file switch WT-3002 Allow applications to exempt threads from eviction. WT-3004 lint: declare functions that don't return a value as void WT-3011 __wt_curjoin_open() saves the wrong URI in the cursor. WT-3012 Test format hanging on LSM configurations WT-3015 Test format stuck with 2mb cache WT-3016 Tests needed for systems without ftruncate WT-3017 Hazard pointer race with page replace causes error WT-3018 lint WT-3020 LSM primary changes impact parallel-pop-lsm load time WT-3022 LSM operations get stuck in __wt_clsm_await_switch waiting for switch on tree to complete WT-3023 Test format hang on zSeries WT-3024 wtperf medium-lsm-compact test can hang Branch: v3.2 https://github.com/mongodb/mongo/commit/c586934f7212f6a9a2087cbaf9a8fcd7d7ce9abf

          People

          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:
              Days since reply:
              14 weeks, 5 days ago
              Date of 1st Reply:

                Agile