[SERVER-25974] Application threads stall for extended period when cache fills Created: 06/Sep/16  Updated: 31/Jul/18  Resolved: 28/Sep/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.9, 3.3.10
Fix Version/s: 3.2.10, 3.3.15

Type: Bug Priority: Critical - P2
Reporter: Bruce Lucas (Inactive) Assignee: Alexander Gorrod
Resolution: Done Votes: 3
Labels: code-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 3.2.10-rc0-timeseries.png     PNG File 3.2.10-rc0_event.png     PNG File 3.2.10-rc1-timeseries.png     PNG File Screen Shot 2016-09-18 at 2.18.01 PM.png     PNG File Screen Shot 2016-09-18 at 4.38.53 PM.png     PNG File issue.png     PNG File repro.png     PNG File s25974_new_connections.png     PNG File s25974_oplog_cache.png     PNG File s25974_seen_queue.png     File server-25974-rc0-vs-rc1-dd.tgz     PNG File test5-io-saturation.png    
Issue Links:
Duplicate
is duplicated by SERVER-26001 Insert workload stalled at 96% cache ... Closed
is duplicated by SERVER-26055 Server fills cache rapidly, even unde... Closed
is duplicated by SERVER-26700 Threads appear to hang after migratin... Closed
Related
related to WT-2924 Ensure we are doing eviction when thr... Closed
is related to WT-2894 Create workload that shows negative s... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:
Case:

 Description   

Under some conditions when cache utilization reaches 100%, the system can go into a state where

  • operation rate falls to near-zero levels
  • application threads stall for tens of seconds, apparently attempting but not succeeding in evicting pages.

This state can persist for many minutes.



 Comments   
Comment by Ramon Fernandez Marina [ 28/Sep/16 ]

The underlying technical problem was addressed in WT-2924.

Comment by Bruce Lucas (Inactive) [ 07/Sep/16 ]

There is a simple insert workload that shows a similar problem; I opened SERVER-26001 to track that. The problem is similar in that ops are stalled for an extended period. However there are some differences that make it unclear whether the issues are in fact the same.

  • here: cache utilization is 100%; there: stuck at 96%
  • here: application threads appear to be getting plenty of work to do but the attempted evictions are failing; there: application threads appear to be starved of work to do.
  • here: complex customer workload; there: simple synthetic insert-only workload
Generated at Thu Feb 08 04:10:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.