Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-6340

Test/checkpoint configuration creates 85MB page in history store

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • 5
    • Storage - Ra 2020-06-15

      While investigating WT-6257 I've seen a lot of cases failing with cache stuck full. The configuration I'm running is:

      ./t -h WT_TEST -t r -r 2 -W 8 -n 400000 -k 100000 -C "cache_size=100MB"
      

      The failure and cache dump looks like:

      [1590803721:854510][30113:0x7ff5bd0cc700], t, eviction-server: __evict_server, 453: Cache stuck for too long, giving up: Connection timed out
      [1590803721:854633][30113:0x7ff5bd0cc700], t, eviction-server: __wt_verbose_dump_cache, 2602: =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
      [1590803721:854644][30113:0x7ff5bd0cc700], t, eviction-server: __wt_verbose_dump_cache, 2603: cache dump
      [1590803721:854651][30113:0x7ff5bd0cc700], t, eviction-server: __wt_verbose_dump_cache, 2605: cache full: no
      [1590803721:854658][30113:0x7ff5bd0cc700], t, eviction-server: __wt_verbose_dump_cache, 2607: cache clean check: yes (95.003%)
      [1590803721:854668][30113:0x7ff5bd0cc700], t, eviction-server: __wt_verbose_dump_cache, 2609: cache dirty check: yes (94.677%)
      [1590803721:854675][30113:0x7ff5bd0cc700], t, file:__wt0002.wt, eviction-server: __verbose_dump_cache_single, 2481: file:__wt0002.wt(<live>):
      [1590803721:854711][30113:0x7ff5bd0cc700], t, file:__wt0002.wt, eviction-server: __verbose_dump_cache_single, 2532: internal: 1 pages, 0MB, 0/1 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      [1590803721:854718][30113:0x7ff5bd0cc700], t, file:__wt0002.wt, eviction-server: __verbose_dump_cache_single, 2546: leaf: 2 pages, 0MB, 2/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      [1590803721:854723][30113:0x7ff5bd0cc700], t, file:__wt0001.wt, eviction-server: __verbose_dump_cache_single, 2481: file:__wt0001.wt(<live>):
      [1590803721:854733][30113:0x7ff5bd0cc700], t, file:__wt0001.wt, eviction-server: __verbose_dump_cache_single, 2532: internal: 1 pages, 0MB, 0/1 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      [1590803721:854738][30113:0x7ff5bd0cc700], t, file:__wt0001.wt, eviction-server: __verbose_dump_cache_single, 2534: leaf: 0 pages
      [1590803721:854742][30113:0x7ff5bd0cc700], t, file:__wt0000.wt, eviction-server: __verbose_dump_cache_single, 2481: file:__wt0000.wt(<live>):
      [1590803721:854762][30113:0x7ff5bd0cc700], t, file:__wt0000.wt, eviction-server: __verbose_dump_cache_single, 2532: internal: 1 pages, 0MB, 0/1 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      [1590803721:854766][30113:0x7ff5bd0cc700], t, file:__wt0000.wt, eviction-server: __verbose_dump_cache_single, 2546: leaf: 2 pages, 0MB, 2/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      [1590803721:854771][30113:0x7ff5bd0cc700], t, file:WiredTigerHS.wt, eviction-server: __verbose_dump_cache_single, 2481: file:WiredTigerHS.wt(<live>):
      [1590803721:854809][30113:0x7ff5bd0cc700], t, file:WiredTigerHS.wt, eviction-server: __verbose_dump_cache_single, 2532: internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      [1590803721:854819][30113:0x7ff5bd0cc700], t, file:WiredTigerHS.wt, eviction-server: __verbose_dump_cache_single, 2546: leaf: 1 pages, 87MB, 0/1 clean/dirty pages, 0/87 clean/dirty MB, 87MB max page, 87MB max dirty page
      [1590803721:854827][30113:0x7ff5bd0cc700], t, file:WiredTiger.wt, eviction-server: __verbose_dump_cache_single, 2481: file:WiredTiger.wt(<live>):
      [1590803721:854835][30113:0x7ff5bd0cc700], t, file:WiredTiger.wt, eviction-server: __verbose_dump_cache_single, 2532: internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      [1590803721:854840][30113:0x7ff5bd0cc700], t, file:WiredTiger.wt, eviction-server: __verbose_dump_cache_single, 2534: leaf: 0 pages
      [1590803721:854844][30113:0x7ff5bd0cc700], t, eviction-server: __wt_verbose_dump_cache, 2623: cache dump: total found: 94MB vs tracked inuse 95MB
      [1590803721:854848][30113:0x7ff5bd0cc700], t, eviction-server: __wt_verbose_dump_cache, 2624: total dirty bytes: 87MB
      [1590803721:854853][30113:0x7ff5bd0cc700], t, eviction-server: __wt_evict_thread_run, 328: cache eviction thread error: Connection timed out
      

      The core issue here is that we've let a history store page grow to 87MB:

      [1590803721:854819][30113:0x7ff5bd0cc700], t, file:WiredTigerHS.wt, eviction-server: __verbose_dump_cache_single, 2546: leaf: 1 pages, 87MB, 0/1 clean/dirty pages, 0/87 clean/dirty MB, 87MB max page, 87MB max dirty page
      

      I have added a thread to test/checkpoint that lags the oldest transaction ID, to make it more commonly use the history store, which I suspect is contributing to seeing (and reproducing) the failure. Diff attached.

      Last week we were evicting history store pages too aggressively - we addressed that in WT-6306 and WT-6229, but I suspect we've gone too far the other way now, and need to figure out how/when to get exclusive access to history store pages, so we can keep them to a reasonable size.

      sulabh.mahajan have you seen any evidence of large history store pages?

        1. large-page.png
          large-page.png
          20 kB
        2. test_checkpoint_lag_txnid.diff
          4 kB

            Assignee:
            sulabh.mahajan@mongodb.com Sulabh Mahajan
            Reporter:
            alexander.gorrod@mongodb.com Alexander Gorrod
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: