Running sysbench shows the following timeline:
- At A the cache has reached its configured limit (actually 85%, I believe). At B the insert phase of the benchmark ends and the update phase begins.
- Second row shows allocated bytes as reported by tcmalloc. Per documentation I believe this is the bytes as requested by the app so does not include any tcmalloc overhead. This reaches approximately 6 GB.
- Third row shows bytes in cache as reported by WT, which reases 5 GB, about 20% less than the allocated bytes.
- Last row shows difference between bytes in cache and allocated bytes. This grows in proportion to the number of bytes in the cache at a rate of about 20%, and stops growing at the point where bytes in cache stops growing. This suggests that WT allocates about 20% more bytes than are accounted for in the "bytes currently in cache" statistic when it is doing inserts.
Call stack data was obtained by using perf to monitor calls to all tmalloc entry points. Here's a partially expanded reversed call tree for a shorter run up to 449 MB bytes allocated by WT. Timeline shows currently active memory (i.e. accounting for both allcoate and free) charged to each call site over the course of the run. Read max bytes active throughout the run for each call site from the "max.MB" column. Full call tree attached as well.