Use list to replace tailq for cross checkpoint caching table

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Cache and Eviction
    • None
    • Storage Engines - Transactions
    • 21.14
    • None
    • 5

      __wt_shared_dsk_cache uses TAILQ for the hash table in cross checkpoint caching, where we have head and tail per bucket:

      #define TAILQ_HEAD(name, type) \
      struct name { \
      struct type *tqh_first; /* first element */\
      struct type **tqh_last; /* addr of last next element */\
      TRACEBUF\
      }

      However, we never use tqh_last in cross checkpoint caching, this doubled the size per bucket, as shared disk cache sizes its bucket array in proportion to the cache size, a 100GB cache size will allocate ~2M buckets,

      hash_size = max(cache_size / 500 / (sizeof(item) + sizeof(bucket_head))
      

      which will waste 8B/bucket * 2M buckets = 16MB RAM per 100GB node.
      We should use LIST to avoid this extra memory usage.

      This improvement can also apply to other hash tables across wireditger, e.g. session->dhhash seems also never touches tqh_last, but since the size is fixed as 512, with 256 open sessions we waste 512 * 256 * 8B = 1MB memory usage which is much less than the memory wasted in shared disk cache.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Zunyi Liu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: