During a 20-minute run with heavy mixed workload the cache was observed to grow on occasion (2 runs out of about 30) to almost 9 GB, compared to a configured value of 5 GB. An assortment of potentially relevant metrics:
- at point B the checkpoint enters the second phase where "range of ids pinned" grows.
- at this point bytes in cache starts growing, and continues growing, apparently indefinitely.
- at C we begin to page performance declines correspondingly.
- if run continued beyond 20 minutes growth would have likely resulted on being killed by OOM killer.
- checkpoint is running longer than it did in runs that didn't experience the issue - typically that third checkpoint completed before the end of the run.
- the spike in reported cached bytes at A, without corresponding increase in VM, is probably the apparent accounting error from
SERVER-16881, which is seen to occur frequently in these runs. However the increase starting at B that is the subject of this ticket is accompanied by an increase in VM, so is likely an unrelated issue.