Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16938

60-second stall between checkpoints under WiredTiger

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.8.0-rc5
    • Component/s: Storage, WiredTiger
    • Labels:
    • Fully Compatible
    • ALL

      During one 10-minute run of a heavy mixed workload observed a 60-second stall, apparently coinciding with the time between the end of one checkpoint and the start of another.

      • From A to B throughput drops to near 0.
      • mongod log shows a handful of ops completing throughout this period, with increasing latencies suggesting that they have been waiting since A.
      • page acquire time sleeping suggests most threads (about 93 out of 100) are waiting for access to pages
      • throughout this period 40 pages per second are being evicted because they exceeded in-memory maximum
      • yet cache statistics show nothing leaving the cache and no change in cache sizes during this period
      • at the end of the period about 2500 failed evictions are reported within 1 second. This is about the same number as the number of pages evicted during that period, i.e. 60 seconds * 40 pages / second. Is that a coincidence, or are the failed evictions reported at the end of the period the same evictions that were reported througout the period?
      • the 60-second stall appears to coinicide with the time between the end of one checkpoint and the start of the next.

        1. 60s-stall-between-checkpoints.png
          60s-stall-between-checkpoints.png
          290 kB
        2. throttle-rc9-01-gdb.html
          64.43 MB
        3. throttle-rc9-01.html
          2.69 MB
        4. throttle-rc9-01.png
          throttle-rc9-01.png
          348 kB

            Assignee:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Reporter:
            bruce.lucas@mongodb.com Bruce Lucas (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: