Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-26001

Insert workload stalled at 96% cache utilization



    • Bug
    • Status: Closed
    • Critical - P2
    • Resolution: Duplicate
    • 3.2.9
    • None
    • WiredTiger
    • None
    • Fully Compatible
    • Storage 2016-10-31


      Note: this may be the same underlying issue as SERVER-25974, but some of the metrics appear to be different, and this ticket has a specific simple repro not clearly tied to the customer issue seen on SERVER-25974, so opening as a separate ticket for now until/unless we can demonstrate that they are the same issue.

      The insert repro workload from SERVER-20306, also attached to this ticket as repro-32-insert.sh, gets stuck with cache utilization at about 96%:

      • problems start at A, pretty much complete stuck at B
      • ftdc stalls ("ftdc samples/s") suggest that application threads are sometimes getting stuck for extended periods doing evictions
      • application threads seem to be starved for work to do:
        • "pages walked for eviction" has gone up but "pages seen by eviction walks" has gone down
        • application threads are often finding the queue empty
        • pages evicted by application threads is not high

      I've also attached stack traces captured during the stuck period, although I don't think they give much insight.


        1. david-screenshot.png
          311 kB
          David Hows
        2. diagnostic.tar
          500 kB
          Bruce Lucas
        3. repro-32-insert.sh
          1 kB
          Bruce Lucas
        4. stacks1.txt
          168 kB
          Bruce Lucas
        5. stacks2.txt
          183 kB
          Bruce Lucas
        6. stacks3.txt
          183 kB
          Bruce Lucas
        7. stuck.png
          236 kB
          Bruce Lucas

        Issue Links



              david.hows David Hows
              bruce.lucas@mongodb.com Bruce Lucas
              2 Vote for this issue
              22 Start watching this issue