Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.9.0, 3.4.0-rc4, 3.2.12
    • Labels:
      None
    • # Replies:
      7
    • Last comment by Customer:
      true
    • Sprint:
      Storage 2016-11-21

      Description

      Test format is will constistently return a stuck cache with the following config:

      ############################################
      #  RUN PARAMETERS
      ############################################
      abort=0
      auto_throttle=1
      backups=0
      bitcnt=1
      bloom=1
      bloom_bit_count=9
      bloom_hash_count=4
      bloom_oldest=0
      cache=2
      checkpoints=1
      checksum=uncompressed
      chunk_size=8
      compaction=0
      compression=snappy
      data_extend=0
      data_source=file
      delete_pct=31
      dictionary=0
      direct_io=0
      encryption=rotn-7
      evict_max=0
      file_type=row-store
      firstfit=0
      huffman_key=0
      huffman_value=0
      in_memory=0
      insert_pct=42
      internal_key_truncation=1
      internal_page_max=9
      isolation=read-uncommitted
      key_gap=2
      key_max=256
      key_min=256
      leaf_page_max=9
      leak_memory=0
      logging=1
      logging_archive=0
      logging_compression=snappy
      logging_prealloc=0
      long_running_txn=0
      lsm_worker_threads=3
      merge_max=20
      mmap=1
      ops=100000
      prefix_compression=1
      prefix_compression_min=5
      quiet=1
      repeat_data_pct=33
      reverse=0
      rows=100000
      runs=1
      rebalance=1
      salvage=1
      split_pct=70
      statistics=0
      statistics_server=0
      threads=2
      timer=20
      transaction-frequency=21
      value_max=2786
      value_min=256
      verify=1
      wiredtiger_config=
      write_pct=83
      ############################################
      

      Cache dump:

      ==========
      cache dump
      file:wt(checkpoint=WiredTigerCheckpoint.3):
              internal: 149 pages, 1MB, 149/0 clean/dirty pages, 1/0 clean/dirty MB, 0MB max page, 0MB max dirty page
              leaf: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      file:wt(<live>):
              internal: 81 pages, 0MB, 5/76 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      file:WiredTigerLAS.wt(<live>):
              internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      file:WiredTiger.wt(<live>):
              internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      cache dump: total found = 1MB vs tracked inuse 1MB
      total dirty bytes = 0MB
      ==========
      

      Run:
      http://build.wiredtiger.com:8080/job/wiredtiger-test-format-stress-zseries/13009/

        Activity

        Hide
        david.hows David Hows added a comment -

        I was able to reproduce this very consistently on x86.

        Show
        david.hows David Hows added a comment - I was able to reproduce this very consistently on x86.
        Hide
        david.hows David Hows added a comment -

        The issue here appears to be that we can wind up with a cache full of dirty internal pages that aren't being made candidates for eviction.

        Looks like this hang was related to the same issue http://build.wiredtiger.com:8080/job/wiredtiger-test-format-stress-zseries/13057/

        A quick check showed that there were 0 dirty leaf pages and quite a large number (2MB) of dirty internal pages of a total of 3MB data in memory.

        Show
        david.hows David Hows added a comment - The issue here appears to be that we can wind up with a cache full of dirty internal pages that aren't being made candidates for eviction. Looks like this hang was related to the same issue http://build.wiredtiger.com:8080/job/wiredtiger-test-format-stress-zseries/13057/ A quick check showed that there were 0 dirty leaf pages and quite a large number (2MB) of dirty internal pages of a total of 3MB data in memory.
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'}

        Message: WT-3015 Change when we will evict internal pages (#3146)
        Branch: develop
        https://github.com/wiredtiger/wiredtiger/commit/e11d885f11fc7c47f1a9160087f738da80567ad2

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'} Message: WT-3015 Change when we will evict internal pages (#3146) Branch: develop https://github.com/wiredtiger/wiredtiger/commit/e11d885f11fc7c47f1a9160087f738da80567ad2
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'}

        Message: WT-3015 Change when we will evict internal pages (#3146)
        Branch: mongodb-3.4
        https://github.com/wiredtiger/wiredtiger/commit/e11d885f11fc7c47f1a9160087f738da80567ad2

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'} Message: WT-3015 Change when we will evict internal pages (#3146) Branch: mongodb-3.4 https://github.com/wiredtiger/wiredtiger/commit/e11d885f11fc7c47f1a9160087f738da80567ad2
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

        Message: Import wiredtiger: ca6eee06ffdacc8e191987e64b3791740dad21e1 from branch mongodb-3.4

        ref: 74430da40c..ca6eee06ff
        for: 3.4.0

        WT-2962 Provide a way to configure builtin extensions
        WT-2984 Search of metadata for recently created collection gets WT_NOTFOUND
        WT-3000 Missing log records in recovery when crashing after a log file switch
        WT-3002 Allow applications to exempt threads from eviction.
        WT-3004 lint: declare functions that don't return a value as void
        WT-3011 __wt_curjoin_open() saves the wrong URI in the cursor.
        WT-3012 Test format hanging on LSM configurations
        WT-3015 Test format stuck with 2mb cache
        WT-3016 Tests needed for systems without ftruncate
        WT-3017 Hazard pointer race with page replace causes error
        WT-3018 lint
        WT-3020 LSM primary changes impact parallel-pop-lsm load time
        WT-3022 LSM operations get stuck in __wt_clsm_await_switch waiting for switch on tree to complete
        WT-3023 Test format hang on zSeries
        WT-3024 wtperf medium-lsm-compact test can hang
        Branch: master
        https://github.com/mongodb/mongo/commit/fb4ae3792065e98696e391ac1c4602216b8502cb

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: Import wiredtiger: ca6eee06ffdacc8e191987e64b3791740dad21e1 from branch mongodb-3.4 ref: 74430da40c..ca6eee06ff for: 3.4.0 WT-2962 Provide a way to configure builtin extensions WT-2984 Search of metadata for recently created collection gets WT_NOTFOUND WT-3000 Missing log records in recovery when crashing after a log file switch WT-3002 Allow applications to exempt threads from eviction. WT-3004 lint: declare functions that don't return a value as void WT-3011 __wt_curjoin_open() saves the wrong URI in the cursor. WT-3012 Test format hanging on LSM configurations WT-3015 Test format stuck with 2mb cache WT-3016 Tests needed for systems without ftruncate WT-3017 Hazard pointer race with page replace causes error WT-3018 lint WT-3020 LSM primary changes impact parallel-pop-lsm load time WT-3022 LSM operations get stuck in __wt_clsm_await_switch waiting for switch on tree to complete WT-3023 Test format hang on zSeries WT-3024 wtperf medium-lsm-compact test can hang Branch: master https://github.com/mongodb/mongo/commit/fb4ae3792065e98696e391ac1c4602216b8502cb
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'}

        Message: WT-3015 Change when we will evict internal pages (#3146)
        Branch: mongodb-3.2
        https://github.com/wiredtiger/wiredtiger/commit/e11d885f11fc7c47f1a9160087f738da80567ad2

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'} Message: WT-3015 Change when we will evict internal pages (#3146) Branch: mongodb-3.2 https://github.com/wiredtiger/wiredtiger/commit/e11d885f11fc7c47f1a9160087f738da80567ad2
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

        Message: Import wiredtiger: 040e3d6f764c0fb626cb47fede54469f57d0c6e0 from branch mongodb-3.2

        ref: 187707a5c1..040e3d6f76
        for: 3.2.12

        WT-2962 Provide a way to configure builtin extensions
        WT-2984 Search of metadata for recently created collection gets WT_NOTFOUND
        WT-3000 Missing log records in recovery when crashing after a log file switch
        WT-3002 Allow applications to exempt threads from eviction.
        WT-3004 lint: declare functions that don't return a value as void
        WT-3011 __wt_curjoin_open() saves the wrong URI in the cursor.
        WT-3012 Test format hanging on LSM configurations
        WT-3015 Test format stuck with 2mb cache
        WT-3016 Tests needed for systems without ftruncate
        WT-3017 Hazard pointer race with page replace causes error
        WT-3018 lint
        WT-3020 LSM primary changes impact parallel-pop-lsm load time
        WT-3022 LSM operations get stuck in __wt_clsm_await_switch waiting for switch on tree to complete
        WT-3023 Test format hang on zSeries
        WT-3024 wtperf medium-lsm-compact test can hang
        Branch: v3.2
        https://github.com/mongodb/mongo/commit/c586934f7212f6a9a2087cbaf9a8fcd7d7ce9abf

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: Import wiredtiger: 040e3d6f764c0fb626cb47fede54469f57d0c6e0 from branch mongodb-3.2 ref: 187707a5c1..040e3d6f76 for: 3.2.12 WT-2962 Provide a way to configure builtin extensions WT-2984 Search of metadata for recently created collection gets WT_NOTFOUND WT-3000 Missing log records in recovery when crashing after a log file switch WT-3002 Allow applications to exempt threads from eviction. WT-3004 lint: declare functions that don't return a value as void WT-3011 __wt_curjoin_open() saves the wrong URI in the cursor. WT-3012 Test format hanging on LSM configurations WT-3015 Test format stuck with 2mb cache WT-3016 Tests needed for systems without ftruncate WT-3017 Hazard pointer race with page replace causes error WT-3018 lint WT-3020 LSM primary changes impact parallel-pop-lsm load time WT-3022 LSM operations get stuck in __wt_clsm_await_switch waiting for switch on tree to complete WT-3023 Test format hang on zSeries WT-3024 wtperf medium-lsm-compact test can hang Branch: v3.2 https://github.com/mongodb/mongo/commit/c586934f7212f6a9a2087cbaf9a8fcd7d7ce9abf

          People

          • Assignee:
            david.hows David Hows
            Reporter:
            david.hows David Hows
            Participants:
            Last commenter:
            Michael Cahill
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:
              Days since reply:
              19 weeks, 3 days ago
              Date of 1st Reply:

                Agile