Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-25974

Application threads stall for extended period when cache fills

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 3.2.9, 3.3.10
    • Fix Version/s: 3.2.10, 3.3.15
    • Component/s: WiredTiger
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL

      Description

      Under some conditions when cache utilization reaches 100%, the system can go into a state where

      • operation rate falls to near-zero levels
      • application threads stall for tens of seconds, apparently attempting but not succeeding in evicting pages.

      This state can persist for many minutes.

      1. server-25974-rc0-vs-rc1-dd.tgz
        15.46 MB
        Michael Cahill
      1. 3.2.10-rc0_event.png
        228 kB
      2. 3.2.10-rc0-timeseries.png
        273 kB
      3. 3.2.10-rc1-timeseries.png
        256 kB
      4. issue.png
        216 kB
      5. repro.png
        228 kB
      6. s25974_new_connections.png
        63 kB
      7. s25974_oplog_cache.png
        59 kB
      8. s25974_seen_queue.png
        70 kB
      9. Screen Shot 2016-09-18 at 2.18.01 PM.png
        318 kB
      10. Screen Shot 2016-09-18 at 4.38.53 PM.png
        193 kB
      11. test5-io-saturation.png
        192 kB

        Issue Links

          Activity

          Hide
          bruce.lucas Bruce Lucas added a comment - - edited

          There is a simple insert workload that shows a similar problem; I opened SERVER-26001 to track that. The problem is similar in that ops are stalled for an extended period. However there are some differences that make it unclear whether the issues are in fact the same.

          • here: cache utilization is 100%; there: stuck at 96%
          • here: application threads appear to be getting plenty of work to do but the attempted evictions are failing; there: application threads appear to be starved of work to do.
          • here: complex customer workload; there: simple synthetic insert-only workload
          Show
          bruce.lucas Bruce Lucas added a comment - - edited There is a simple insert workload that shows a similar problem; I opened SERVER-26001 to track that. The problem is similar in that ops are stalled for an extended period. However there are some differences that make it unclear whether the issues are in fact the same. here: cache utilization is 100%; there: stuck at 96% here: application threads appear to be getting plenty of work to do but the attempted evictions are failing; there: application threads appear to be starved of work to do. here: complex customer workload; there: simple synthetic insert-only workload
          Hide
          ramon.fernandez Ramon Fernandez added a comment -

          The underlying technical problem was addressed in WT-2924.

          Show
          ramon.fernandez Ramon Fernandez added a comment - The underlying technical problem was addressed in WT-2924 .

            People

            • Votes:
              3 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: