tcmalloc may occasionally release large amounts of pageheap free memory to the kernel by calling madvise. This can take seconds when the amount of memory involved is many GB. A tcmalloc internal lock is held while this happens, so this can potentially stall many threads, causing widespread latency spikes.
There is no direct metric that diagnoses this (
SERVER-31380 would provide that), but it can be indirectly inferred to be a likely cause from the following:
- tcmalloc pageheap free memory decreases to near zero
- tcmalloc unmapped memory increases by a corresponding amount
- resident memory decreases by the same amount
- system free memory increases by that amount