Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3012

Test format hanging on LSM configurations

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.9.0, 3.4.0-rc4, 3.2.12
    • Labels:
      None
    • # Replies:
      9
    • Last comment by Customer:
      true
    • Sprint:
      Storage 2016-11-21

      Description

      After the changes in WT-3009, there have been a number of stuck cache aborts on test format runs that use LSM.

      These are reproducible fairly quickly (under 50 runs) on Linux with configs such as below

      ############################################
      #  RUN PARAMETERS
      ############################################
      abort=0
      auto_throttle=1
      backups=0
      bitcnt=6
      bloom=1
      bloom_bit_count=45
      bloom_hash_count=31
      bloom_oldest=0
      cache=30
      checkpoints=1
      checksum=uncompressed
      chunk_size=1
      compaction=0
      compression=zlib
      data_extend=0
      data_source=lsm
      delete_pct=14
      dictionary=0
      direct_io=0
      encryption=none
      evict_max=4
      file_type=row-store
      firstfit=0
      huffman_key=0
      huffman_value=0
      in_memory=0
      insert_pct=73
      internal_key_truncation=0
      internal_page_max=10
      isolation=random
      key_gap=12
      key_max=64
      key_min=26
      leaf_page_max=17
      leak_memory=0
      logging=1
      logging_archive=0
      logging_compression=none
      logging_prealloc=0
      long_running_txn=0
      lsm_worker_threads=4
      merge_max=17
      mmap=1
      ops=100000
      prefix_compression=1
      prefix_compression_min=6
      quiet=1
      repeat_data_pct=29
      reverse=0
      rows=100000
      runs=1
      rebalance=1
      salvage=1
      split_pct=85
      statistics=1
      statistics_server=0
      threads=11
      timer=20
      transaction-frequency=36
      value_max=1638
      value_min=15
      verify=1
      wiredtiger_config=
      write_pct=42
      ############################################
      

      One solution is to modify the changes to the evict trigger setting changed in WT-3009. The more correct option is likely to change how dirty page accounting works in LSM. Currently dirty pages on the primary LSM chunk are counted towards the dirty page total. As these dirty pages are fully expected, capped in size and dealt with by LSM merges they can potentially be removed from the count.

        Issue Links

          Activity

          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'}

          Message: WT-3012 Don't track the LSM Primary as part of dirty bytes in cache (#3136)
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/dc2d7aa2d1f5756bdf522168fc056824543eb067

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'} Message: WT-3012 Don't track the LSM Primary as part of dirty bytes in cache (#3136) Branch: develop https://github.com/wiredtiger/wiredtiger/commit/dc2d7aa2d1f5756bdf522168fc056824543eb067
          Hide
          michael.cahill Michael Cahill added a comment -

          The first commit introduced failures (see WT-3019), reopening.

          Show
          michael.cahill Michael Cahill added a comment - The first commit introduced failures (see WT-3019 ), reopening.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: WT-3012 Check a btree is LSM primary before switching. (#3143)
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/05e9389be80f28a63327d62d3d2e2eb1ecc3e14b

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: WT-3012 Check a btree is LSM primary before switching. (#3143) Branch: develop https://github.com/wiredtiger/wiredtiger/commit/05e9389be80f28a63327d62d3d2e2eb1ecc3e14b
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'}

          Message: WT-3012 Don't track the LSM Primary as part of dirty bytes in cache (#3136)
          Branch: mongodb-3.4
          https://github.com/wiredtiger/wiredtiger/commit/dc2d7aa2d1f5756bdf522168fc056824543eb067

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'} Message: WT-3012 Don't track the LSM Primary as part of dirty bytes in cache (#3136) Branch: mongodb-3.4 https://github.com/wiredtiger/wiredtiger/commit/dc2d7aa2d1f5756bdf522168fc056824543eb067
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: WT-3012 Check a btree is LSM primary before switching. (#3143)
          Branch: mongodb-3.4
          https://github.com/wiredtiger/wiredtiger/commit/05e9389be80f28a63327d62d3d2e2eb1ecc3e14b

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: WT-3012 Check a btree is LSM primary before switching. (#3143) Branch: mongodb-3.4 https://github.com/wiredtiger/wiredtiger/commit/05e9389be80f28a63327d62d3d2e2eb1ecc3e14b
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'}

          Message: WT-3012 Don't track the LSM Primary as part of dirty bytes in cache (#3136)
          Branch: mongodb-3.2
          https://github.com/wiredtiger/wiredtiger/commit/dc2d7aa2d1f5756bdf522168fc056824543eb067

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'} Message: WT-3012 Don't track the LSM Primary as part of dirty bytes in cache (#3136) Branch: mongodb-3.2 https://github.com/wiredtiger/wiredtiger/commit/dc2d7aa2d1f5756bdf522168fc056824543eb067
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: WT-3012 Check a btree is LSM primary before switching. (#3143)
          Branch: mongodb-3.2
          https://github.com/wiredtiger/wiredtiger/commit/05e9389be80f28a63327d62d3d2e2eb1ecc3e14b

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: WT-3012 Check a btree is LSM primary before switching. (#3143) Branch: mongodb-3.2 https://github.com/wiredtiger/wiredtiger/commit/05e9389be80f28a63327d62d3d2e2eb1ecc3e14b
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: Import wiredtiger: ca6eee06ffdacc8e191987e64b3791740dad21e1 from branch mongodb-3.4

          ref: 74430da40c..ca6eee06ff
          for: 3.4.0

          WT-2962 Provide a way to configure builtin extensions
          WT-2984 Search of metadata for recently created collection gets WT_NOTFOUND
          WT-3000 Missing log records in recovery when crashing after a log file switch
          WT-3002 Allow applications to exempt threads from eviction.
          WT-3004 lint: declare functions that don't return a value as void
          WT-3011 __wt_curjoin_open() saves the wrong URI in the cursor.
          WT-3012 Test format hanging on LSM configurations
          WT-3015 Test format stuck with 2mb cache
          WT-3016 Tests needed for systems without ftruncate
          WT-3017 Hazard pointer race with page replace causes error
          WT-3018 lint
          WT-3020 LSM primary changes impact parallel-pop-lsm load time
          WT-3022 LSM operations get stuck in __wt_clsm_await_switch waiting for switch on tree to complete
          WT-3023 Test format hang on zSeries
          WT-3024 wtperf medium-lsm-compact test can hang
          Branch: master
          https://github.com/mongodb/mongo/commit/fb4ae3792065e98696e391ac1c4602216b8502cb

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: Import wiredtiger: ca6eee06ffdacc8e191987e64b3791740dad21e1 from branch mongodb-3.4 ref: 74430da40c..ca6eee06ff for: 3.4.0 WT-2962 Provide a way to configure builtin extensions WT-2984 Search of metadata for recently created collection gets WT_NOTFOUND WT-3000 Missing log records in recovery when crashing after a log file switch WT-3002 Allow applications to exempt threads from eviction. WT-3004 lint: declare functions that don't return a value as void WT-3011 __wt_curjoin_open() saves the wrong URI in the cursor. WT-3012 Test format hanging on LSM configurations WT-3015 Test format stuck with 2mb cache WT-3016 Tests needed for systems without ftruncate WT-3017 Hazard pointer race with page replace causes error WT-3018 lint WT-3020 LSM primary changes impact parallel-pop-lsm load time WT-3022 LSM operations get stuck in __wt_clsm_await_switch waiting for switch on tree to complete WT-3023 Test format hang on zSeries WT-3024 wtperf medium-lsm-compact test can hang Branch: master https://github.com/mongodb/mongo/commit/fb4ae3792065e98696e391ac1c4602216b8502cb
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: Import wiredtiger: 040e3d6f764c0fb626cb47fede54469f57d0c6e0 from branch mongodb-3.2

          ref: 187707a5c1..040e3d6f76
          for: 3.2.12

          WT-2962 Provide a way to configure builtin extensions
          WT-2984 Search of metadata for recently created collection gets WT_NOTFOUND
          WT-3000 Missing log records in recovery when crashing after a log file switch
          WT-3002 Allow applications to exempt threads from eviction.
          WT-3004 lint: declare functions that don't return a value as void
          WT-3011 __wt_curjoin_open() saves the wrong URI in the cursor.
          WT-3012 Test format hanging on LSM configurations
          WT-3015 Test format stuck with 2mb cache
          WT-3016 Tests needed for systems without ftruncate
          WT-3017 Hazard pointer race with page replace causes error
          WT-3018 lint
          WT-3020 LSM primary changes impact parallel-pop-lsm load time
          WT-3022 LSM operations get stuck in __wt_clsm_await_switch waiting for switch on tree to complete
          WT-3023 Test format hang on zSeries
          WT-3024 wtperf medium-lsm-compact test can hang
          Branch: v3.2
          https://github.com/mongodb/mongo/commit/c586934f7212f6a9a2087cbaf9a8fcd7d7ce9abf

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: Import wiredtiger: 040e3d6f764c0fb626cb47fede54469f57d0c6e0 from branch mongodb-3.2 ref: 187707a5c1..040e3d6f76 for: 3.2.12 WT-2962 Provide a way to configure builtin extensions WT-2984 Search of metadata for recently created collection gets WT_NOTFOUND WT-3000 Missing log records in recovery when crashing after a log file switch WT-3002 Allow applications to exempt threads from eviction. WT-3004 lint: declare functions that don't return a value as void WT-3011 __wt_curjoin_open() saves the wrong URI in the cursor. WT-3012 Test format hanging on LSM configurations WT-3015 Test format stuck with 2mb cache WT-3016 Tests needed for systems without ftruncate WT-3017 Hazard pointer race with page replace causes error WT-3018 lint WT-3020 LSM primary changes impact parallel-pop-lsm load time WT-3022 LSM operations get stuck in __wt_clsm_await_switch waiting for switch on tree to complete WT-3023 Test format hang on zSeries WT-3024 wtperf medium-lsm-compact test can hang Branch: v3.2 https://github.com/mongodb/mongo/commit/c586934f7212f6a9a2087cbaf9a8fcd7d7ce9abf

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                24 weeks, 3 days ago
                Date of 1st Reply:

                  Agile