-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Cache and Eviction
-
Storage Engines
-
None
-
None
An expected change in the disaggregated storage deployment model is that MongoDB will actively manage system load based on the current state of the system.
One of the ingredients for managing that load should be insight into how loaded (or overloaded) WiredTiger is at any point in time. WiredTiger should make available a metric that indicates how loaded the system is. Ideally that metric can distinguish between read load and write load.
This ticket is two fold:
- Decide how to expose the metric. Is it a few new statistics? Is it a new API on the connection handle? Is it something else?
- Implement a first version that reports business.
A good proxy for the first implementation might be a combination of existing cache utilization/management heuristics.
--- a/src/evict/evict_lru.c +++ b/src/evict/evict_lru.c @@ -2904,3 +2904,47 @@ __wt_verbose_dump_cache(WT_SESSION_IMPL *session) return (0); } + +/* + * __wt_evict_load_level + * Retrieve the current load level in the system according to the eviction implementation + * that manages cache pressure. Eventually this should form part of the total system + * load calculation (there will be causes for load that exist outside the cache). + * The load level is a range between 0 and 100, with 0 corresponding to virtually no load + * and 100 corresponding to completely overwhelmed (unable to complete current work). + */ +int +__wt_evict_load_level(WT_SESSION_IMPL *session, uint32_t *read_loadp, uint32_t *write_loadp) +{ + WT_CACHE *cache; + WT_CONNECTION_IMPL *conn; + WT_DECL_RET; + struct timespec now; + uint32_t read_load, write_load; + + cache = S2C(session)->cache; + read_load = write_load = 0; + + if (F_ISSET(cache, WT_CACHE_EVICT_CLEAN)) + read_load += 10; + if (F_ISSET(cache, WT_CACHE_EVICT_CLEAN_HARD)) + read_load += 20; + + /* Take into account application threads contributing to eviction */ + + /* Take into account application threads waiting for space in the cache */ + + if (__wt_cache_stuck(session)) { + /* Stuck is always bad, but stuck for a long time gets more aggressively bad. */ + __wt_epoch(session, &now); + time_diff_ms = WT_TIMEDIFF_MS(now, cache->stuck_time); + if (F_ISSET(cache, WT_CACHE_EVICT_CLEAN_HARD)) + read_load += 30 + (time_diff_ms / 10); + if (F_ISSET(cache, WT_CACHE_EVICT_DIRTY_HARD)) + write_load += 30 + (time_diff_ms / 10); + } + + *read_loadp = WT_MIN(read_load, 100); + *write_loadp = WT_MIN(write_load, 100); + return (0); +}