-
Type: Improvement
-
Resolution: Done
-
Priority: Critical - P2
-
Affects Version/s: None
-
Component/s: WiredTiger
-
Fully Compatible
-
(copied to CRM)
When cache utilization hits 95% performance falls off a cliff, severely impacting production.
If the solution to this isn't to (gently) keep utilization from hitting 95%, then do we need to look at why threads getting involved in evictions at 95% is so impactful? Note that on in the incident on the primary that bruce.lucas analyzed it appeared to me that the shortfall between evictions required to keep the cache steady and actual evictions was only 0.5%, yet the impact on operation rates and latencies to get application threads involved in evictions seemed far out of proportion to the shortfall that they had to make up.
If on the other hand evictions are really so fundamentally difficult that increasing eviction rate by 0.5% is hard, does it make sense to look at it from the other end, throttling application threads by the 0.5% required (in this example) to make up the shortfall by very slightly reducing rate of pages read into cache? A similar analysis of the lag incident on the secondary showed that the shortfall was about 9%, yet making up that shortfall when the cache hit 95% utilization nearly brings replication to a halt for extended periods.
- depends on
-
WT-2702 Under high thread load, WiredTiger exceeds cache size
- Closed
- is duplicated by
-
SERVER-23001 Occasional 100% cache uses cripples server
- Closed
-
SERVER-24094 Server cache use can take up to an hour to recover from heavy load
- Closed
-
SERVER-24139 Insert speed decrease rapidly
- Closed
-
SERVER-24983 Remove method really slow
- Closed