[SERVER-20248] Memory growth in __wt_session_get_btree in __checkpoint_worker under WiredTiger Created: 01/Sep/15 Updated: 11/Jan/16 Resolved: 03/Sep/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | 3.0.6 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Michael Cahill (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | WTmem | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
This is ticket is related to the latest issue discussed on
Memory outside WT cache is observed to steadily grow over a period of hours in a test on 3.0.5:
Memory profiling using tcmalloc HEAPPROFILE shows the following as a candidate for the culprit: various allocations within in __checkpoint_worker (labeled "A" below) steadily grow, and account for about 1.5 GB of non-cache memory by the end of this run:
|
| Comments |
| Comment by Bruce Lucas (Inactive) [ 03/Sep/15 ] |
|
After a 5+ hour run we can confirm no memory growth on 3.0.6:
|
| Comment by Bruce Lucas (Inactive) [ 02/Sep/15 ] |
|
One-hour run with close_idle_time=300 (five minutes) shows no memory growth: This supports the theory that the issue is an accumulation of handles created by the checkpoint. There is still a large amount of memory outside the cache. 800 MB is log slot buffer (due to the issue with that in 3.0.5), assume the rest is just due to the large number of dhandles open simply due to the 48k tables (16k collections + 32k indexes). Next run will be on 3.0.6 to a) confirm issue still exists there and b) look at the new data-handle stats. |
| Comment by Bruce Lucas (Inactive) [ 02/Sep/15 ] |
|
A couple improvements to the tooling more clearly show about 3 GB of memory allocated by __conn_dhandle_get within a checkpoint ("A" below), presumably not accounted for in the WT cache, rising linearly over the course of a 3-hour run.
This run had syncdelay set to 5 seconds vs default 60 seconds. That did not clearly increase the rate of memory increase, but I think that's because checkpoints were taking a very long time so the number of checkpoints was about the same in either case. |