-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Cache and Eviction
-
None
-
Storage Engines - Transactions
-
21.14
-
None
-
5
__wt_shared_dsk_cache uses TAILQ for the hash table in cross checkpoint caching, where we have head and tail per bucket:
#define TAILQ_HEAD(name, type) \
struct name { \
struct type *tqh_first; /* first element */\
struct type **tqh_last; /* addr of last next element */\
TRACEBUF\
}
However, we never use tqh_last in cross checkpoint caching, this doubled the size per bucket, as shared disk cache sizes its bucket array in proportion to the cache size, a 100GB cache size will allocate ~2M buckets,
hash_size = max(cache_size / 500 / (sizeof(item) + sizeof(bucket_head))
which will waste 8B/bucket * 2M buckets = 16MB RAM per 100GB node.
We should use LIST to avoid this extra memory usage.
This improvement can also apply to other hash tables across wireditger, e.g. session->dhhash seems also never touches tqh_last, but since the size is fixed as 512, with 256 open sessions we waste 512 * 256 * 8B = 1MB memory usage which is much less than the memory wasted in shared disk cache.