Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-5489

page-read can race with threads locking in-memory page structures

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • WT10.0.0, 4.3.4, 4.7.0
    • Affects Version/s: None
    • Component/s: None
    • 5
    • Storage Engines 2020-02-10
    • v4.2, v4.0

      There have been a lot of failures lately in both Jenkins and Evergreen testing where test/format fails with its 15 minute timeout and the cache dump shows that the cache is nearly empty.

      Here's one particular example where the clean cache is only 2% full and the dirty cache is < 1% full.
      http://build.wiredtiger.com:8080/job/wiredtiger-test-format-stress-sanitizer-ppc/9182/console

      format run more than 15 minutes past the maximum time
      format run dumping cache and transaction state, then aborting the process
      0x3fffa9c10800:=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
      0x3fffa9c10800:transaction state dump
      0x3fffa9c10800:current ID: 51
      0x3fffa9c10800:last running ID: 14
      0x3fffa9c10800:metadata_pinned ID: 5
      0x3fffa9c10800:oldest ID: 5
      0x3fffa9c10800:durable timestamp: (0, 2421)
      0x3fffa9c10800:oldest timestamp: (0, 2421)
      0x3fffa9c10800:pinned timestamp: (0, 232)
      0x3fffa9c10800:stable timestamp: (0, 0)
      0x3fffa9c10800:has_durable_timestamp: yes
      0x3fffa9c10800:has_oldest_timestamp: yes
      0x3fffa9c10800:has_pinned_timestamp: yes
      0x3fffa9c10800:has_stable_timestamp: no
      0x3fffa9c10800:oldest_is_pinned: no
      0x3fffa9c10800:stable_is_pinned: no
      0x3fffa9c10800:checkpoint running: no
      0x3fffa9c10800:checkpoint generation: 13
      0x3fffa9c10800:checkpoint pinned ID: 0
      0x3fffa9c10800:checkpoint txn ID: 0
      0x3fffa9c10800:oldest named snapshot ID: 0
      0x3fffa9c10800:session count: 21
      0x3fffa9c10800:Transaction state of active sessions:
      0x3fffa9c10800:ID: 0, pinned ID: 5, metadata pinned ID: 0, name: WT_CURSOR.next
      0x3fffa9c10800:transaction id: 0, mod count: 0, snap min: 5, snap max: 5, snapshot count: 0, commit_timestamp: (0, 0), durable_timestamp: (0, 0), first_commit_timestamp: (0, 0), prepare_timestamp: (0, 0), read_timestamp: (0, 0), checkpoint LSN: [0][0], full checkpoint: false, rollback reason: , flags: 0x00000008, isolation: WT_ISO_READ_COMMITTED
      0x3fffa9c10800:ID: 14, pinned ID: 13, metadata pinned ID: 0, name: WT_CURSOR.search
      0x3fffa9c10800:transaction id: 14, mod count: 3, snap min: 13, snap max: 13, snapshot count: 0, commit_timestamp: (0, 232), durable_timestamp: (0, 232), first_commit_timestamp: (0, 232), prepare_timestamp: (0, 0), read_timestamp: (0, 234), checkpoint LSN: [0][0], full checkpoint: false, rollback reason: , flags: 0x0000288c, isolation: WT_ISO_SNAPSHOT
      0x3fffa9c10800:ID: 0, pinned ID: 14, metadata pinned ID: 0, name: WT_CURSOR.search
      0x3fffa9c10800:transaction id: 0, mod count: 0, snap min: 14, snap max: 22, snapshot count: 1, commit_timestamp: (0, 255), durable_timestamp: (0, 255), first_commit_timestamp: (0, 255), prepare_timestamp: (0, 0), read_timestamp: (0, 232), checkpoint LSN: [0][0], full checkpoint: false, rollback reason: , flags: 0x00002888, isolation: WT_ISO_SNAPSHOT
      0x3fffa9c10800:ID: 38, pinned ID: 14, metadata pinned ID: 0, name: WT_CURSOR.reserve
      0x3fffa9c10800:transaction id: 38, mod count: 1, snap min: 14, snap max: 38, snapshot count: 1, commit_timestamp: (0, 2421), durable_timestamp: (0, 2421), first_commit_timestamp: (0, 2421), prepare_timestamp: (0, 0), read_timestamp: (0, 2423), checkpoint LSN: [0][0], full checkpoint: false, rollback reason: , flags: 0x0100288c, isolation: WT_ISO_SNAPSHOT
      0x3fffa9c10800:=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
      0x3fffa9c10800:cache dump
      0x3fffa9c10800:cache full: no
      0x3fffa9c10800:cache clean check: no (2.170%)
      0x3fffa9c10800:cache dirty check: no (0.862%)
      0x3fffa9c10800:file:wt(<live>):
      0x3fffa9c10800:internal: 205 pages, 0MB, 205/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      0x3fffa9c10800:leaf: 149 pages, 0MB, 68/81 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      0x3fffa9c10800:file:WiredTigerLAS.wt(<live>) eviction disabled at open:
      0x3fffa9c10800:internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      0x3fffa9c10800:leaf: 0 pages
      0x3fffa9c10800:file:WiredTiger.wt(<live>):
      0x3fffa9c10800:internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      0x3fffa9c10800:leaf: 1 pages, 0MB, 0/1 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      0x3fffa9c10800:cache dump: total found: 1MB vs tracked inuse 1MB
      0x3fffa9c10800:total dirty bytes: 0MB
      WiredTiger Error: aborting WiredTiger library
      

            Assignee:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Reporter:
            sue.loverso@mongodb.com Susan LoVerso
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: