Under certain workloads a large amount of memory in excess of allocated memory is used. This appears to be due to fragmentation, or some related memory allocation inefficiency. Repro consists of:
- mongod running with 10 GB cache (no journal to simplify the situation)
- create a 10 GB collection of small documents called "ping", filling the cache
- create a second 10 GB collection, "pong", replacing the first in the cache
- issue a query to read the first collection "ping" back into the cache, replacing "pong"
Memory stats over the course of the run:
- from A-B "ping" is being created, and from C-D "pong" is being created, replacing "ping" in the cache
- starting at D "ping" is being read back into the cache, evicting "pong". As "pong" is evicted from cache in principle the memory so freed should be usable for reading "ping" into the cache.
- however from D-E we see heap size and central cache free bytes increasing. It appears that for some reason the memory freed by evicting "pong" cannot be used to hold "ping", so it is being returned to the central free list, and instead new memory is being obtained from the OS to hold "ping".
- at E, while "ping" is still being read into memory, we see a change in behavior: free memory appears to have been moved from the central free list to the page heap. WT reports number of pages is no longer increasing. I suspect that at this point "ping" has filled the cache and we are successfully recycling memory freed by evicting older "ping" pages to hold newer "ping" pages.
- but the net is still about 7 GB of memory in use by the process beyond the 9.5 GB allocated and 9.2 GB in the WT cache, or about a 75% excess.
- smaller buffers freed by evicting "pong" are discontiguous and cannot hold larger buffers required for reading in "ping"
- the buffers freed by evicting "pong" are contiguous, but adjacent buffers are not coalesced by the allocator
- buffers are eventually coalesced by the allocator, but not in time to be used for reading in "ping"