Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Cache and Eviction
Labels:
- Disag_Storage
- or-workload-management

Epic Link:
WT-14488
Assigned Teams:

Storage Engines
Sprint:
None
Story Points:
None

An expected change in the disaggregated storage deployment model is that MongoDB will actively manage system load based on the current state of the system.

One of the ingredients for managing that load should be insight into how loaded (or overloaded) WiredTiger is at any point in time. WiredTiger should make available a metric that indicates how loaded the system is. Ideally that metric can distinguish between read load and write load.

This ticket is two fold:

Decide how to expose the metric. Is it a few new statistics? Is it a new API on the connection handle? Is it something else?
Implement a first version that reports business.

A good proxy for the first implementation might be a combination of existing cache utilization/management heuristics.

--- a/src/evict/evict_lru.c
+++ b/src/evict/evict_lru.c
@@ -2904,3 +2904,47 @@ __wt_verbose_dump_cache(WT_SESSION_IMPL *session)
 
     return (0);
 }
+
+/*
+ * __wt_evict_load_level
+ *     Retrieve the current load level in the system according to the eviction implementation
+ *     that manages cache pressure. Eventually this should form part of the total system
+ *     load calculation (there will be causes for load that exist outside the cache).
+ *     The load level is a range between 0 and 100, with 0 corresponding to virtually no load
+ *     and 100 corresponding to completely overwhelmed (unable to complete current work).
+ */
+int
+__wt_evict_load_level(WT_SESSION_IMPL *session, uint32_t *read_loadp, uint32_t *write_loadp)
+{
+    WT_CACHE *cache;
+    WT_CONNECTION_IMPL *conn;
+    WT_DECL_RET;
+    struct timespec now;
+    uint32_t read_load, write_load;
+
+    cache = S2C(session)->cache;
+    read_load = write_load = 0;
+
+    if (F_ISSET(cache, WT_CACHE_EVICT_CLEAN))
+        read_load += 10;
+    if (F_ISSET(cache, WT_CACHE_EVICT_CLEAN_HARD))
+        read_load += 20;
+
+    /* Take into account application threads contributing to eviction */
+
+    /* Take into account application threads waiting for space in the cache */
+
+    if (__wt_cache_stuck(session)) {
+        /* Stuck is always bad, but stuck for a long time gets more aggressively bad. */
+        __wt_epoch(session, &now);
+        time_diff_ms = WT_TIMEDIFF_MS(now, cache->stuck_time);
+        if (F_ISSET(cache, WT_CACHE_EVICT_CLEAN_HARD))
+            read_load += 30 + (time_diff_ms / 10);
+        if (F_ISSET(cache, WT_CACHE_EVICT_DIRTY_HARD))
+            write_load += 30 + (time_diff_ms / 10);
+    }
+
+    *read_loadp = WT_MIN(read_load, 100);
+    *write_loadp = WT_MIN(write_load, 100);
+    return (0);
+}

Assignee:: Unassigned
Reporter:: Alexander Gorrod
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Sep 24 2024 05:43:15 AM UTC
Updated:: Aug 16 2025 12:09:00 AM UTC

Details

Description

Attachments

Activity

People

Dates