[SERVER-8134] Memory Leak in 2.3.0 Created: 10/Jan/13  Updated: 08/Mar/13  Resolved: 15/Feb/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.3.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Mark porter Assignee: Eric Milkie
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File mongoshard1graph.png    
Issue Links:
Depends
Related
Operating System: ALL
Participants:

 Description   

2.3.0 showing steady increase in virtual non-mapped memory - symptomatic of a memory leak.



 Comments   
Comment by James Wahlin [ 15/Feb/13 ]

Note that a map-reduce triggered memory leak was recently fixed under SERVER-8442. This impacted 2.3.0 as well as earlier releases.

Comment by Ben Becker [ 28/Jan/13 ]

Hi Ian,

It's hard to identify the cause without knowing more about the workload. This JIRA ticket was filed for a leak in v2.3.0, which was still using spidermonkey. You could try with v2.3.0, or try building mongodb with spidermonkey by passing the '--usesm' flag to scons (note you will need to 'git checkout r2.3.2').

The most useful way for us to determine the cause would be a script to reproduce the leak. I can provide a secure private SCP server for you to upload scripts if that's any easier.

Thanks,
Ben

Comment by Ian Beaver [ 28/Jan/13 ]

I have not had time to create standalone script to reproduce this yet, but I was able to test it on 2.2.3-rc1 and could not reproduce it. With the same client code, data, and cluster configuration 2.2.3-rc1 appears stable. After 50 min of running 2.3.2 would be out of swap and killed, after 50 min on 2.2.3-rc1 the primary shard was only using 2GB ram. Perhaps its related to the v8 switch?

Comment by Ian Beaver [ 24/Jan/13 ]

Attached memory report from primary shard showing all memory and swap usage and then getting killed by the kernel OOM.

Comment by Ben Becker [ 23/Jan/13 ]

Thanks Ian. A script to reproduce would be very helpful.

Comment by Ian Beaver [ 23/Jan/13 ]

I cant attach the actual code as its part of a propriety code base, but I will try to reproduce it with a test script. I was originally thinking that the mongod on the primary shard was just dying without an error since it used to segfault in 2.3.1, but now I see in the syslog the OOM killer killed it.

Jan 23 11:52:53 mongonode1 kernel: [498032.932403] mongod invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Jan 23 11:52:53 mongonode1 kernel: [498032.932409] mongod cpuset=/ mems_allowed=0
Jan 23 11:52:53 mongonode1 kernel: [498032.932411] Pid: 22447, comm: mongod Not tainted 3.2.0-35-generic #55-Ubuntu
Jan 23 11:52:53 mongonode1 kernel: [498032.932413] Call Trace:
Jan 23 11:52:53 mongonode1 kernel: [498032.932423] [<ffffffff810c0e7d>] ? cpuset_print_task_mems_allowed+0x9d/0xb0
Jan 23 11:52:53 mongonode1 kernel: [498032.932428] [<ffffffff8111bdc1>] dump_header+0x91/0xe0
Jan 23 11:52:53 mongonode1 kernel: [498032.932430] [<ffffffff8111c145>] oom_kill_process+0x85/0xb0
Jan 23 11:52:53 mongonode1 kernel: [498032.932432] [<ffffffff8111c4ea>] out_of_memory+0xfa/0x220
Jan 23 11:52:53 mongonode1 kernel: [498032.932435] [<ffffffff81121eec>] __alloc_pages_nodemask+0x8dc/0x8f0
Jan 23 11:52:53 mongonode1 kernel: [498032.932440] [<ffffffff81158f26>] alloc_pages_current+0xb6/0x120
Jan 23 11:52:53 mongonode1 kernel: [498032.932443] [<ffffffff81118a47>] __page_cache_alloc+0xb7/0xd0
Jan 23 11:52:53 mongonode1 kernel: [498032.932446] [<ffffffff8111aca4>] filemap_fault+0x234/0x3e0
Jan 23 11:52:53 mongonode1 kernel: [498032.932463] [<ffffffff8116f5cb>] ? mem_cgroup_update_page_stat+0x2b/0x110
Jan 23 11:52:53 mongonode1 kernel: [498032.932466] [<ffffffff8113b332>] __do_fault+0x72/0x550
Jan 23 11:52:53 mongonode1 kernel: [498032.932469] [<ffffffff8165b0cf>] ? schedule+0x3f/0x60
Jan 23 11:52:53 mongonode1 kernel: [498032.932472] [<ffffffff8113e9ca>] handle_pte_fault+0xfa/0x200
Jan 23 11:52:53 mongonode1 kernel: [498032.932474] [<ffffffff8113ee88>] handle_mm_fault+0x1f8/0x350
Jan 23 11:52:53 mongonode1 kernel: [498032.932478] [<ffffffff81660ad0>] do_page_fault+0x150/0x520
Jan 23 11:52:53 mongonode1 kernel: [498032.932480] [<ffffffff8165bff4>] ? do_nanosleep+0xa4/0xd0
Jan 23 11:52:53 mongonode1 kernel: [498032.932484] [<ffffffff81090708>] ? hrtimer_nanosleep+0xb8/0x180
Jan 23 11:52:53 mongonode1 kernel: [498032.932487] [<ffffffff8108f2d0>] ? update_rmtp+0x70/0x70
Jan 23 11:52:53 mongonode1 kernel: [498032.932490] [<ffffffff810900c4>] ? hrtimer_start_range_ns+0x14/0x20
Jan 23 11:52:53 mongonode1 kernel: [498032.932492] [<ffffffff8165d735>] page_fault+0x25/0x30
Jan 23 11:52:53 mongonode1 kernel: [498032.932494] Mem-Info:
Jan 23 11:52:53 mongonode1 kernel: [498032.932495] Node 0 DMA per-cpu:
Jan 23 11:52:53 mongonode1 kernel: [498032.932498] CPU 0: hi: 0, btch: 1 usd: 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932499] CPU 1: hi: 0, btch: 1 usd: 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932501] CPU 2: hi: 0, btch: 1 usd: 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932502] CPU 3: hi: 0, btch: 1 usd: 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932504] CPU 4: hi: 0, btch: 1 usd: 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932505] CPU 5: hi: 0, btch: 1 usd: 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932507] CPU 6: hi: 0, btch: 1 usd: 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932508] CPU 7: hi: 0, btch: 1 usd: 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932509] Node 0 DMA32 per-cpu:
Jan 23 11:52:53 mongonode1 kernel: [498032.932511] CPU 0: hi: 186, btch: 31 usd: 37
Jan 23 11:52:53 mongonode1 kernel: [498032.932513] CPU 1: hi: 186, btch: 31 usd: 22
Jan 23 11:52:53 mongonode1 kernel: [498032.932514] CPU 2: hi: 186, btch: 31 usd: 20
Jan 23 11:52:53 mongonode1 kernel: [498032.932516] CPU 3: hi: 186, btch: 31 usd: 14
Jan 23 11:52:53 mongonode1 kernel: [498032.932517] CPU 4: hi: 186, btch: 31 usd: 15
Jan 23 11:52:53 mongonode1 kernel: [498032.932519] CPU 5: hi: 186, btch: 31 usd: 13
Jan 23 11:52:53 mongonode1 kernel: [498032.932520] CPU 6: hi: 186, btch: 31 usd: 11
Jan 23 11:52:53 mongonode1 kernel: [498032.932522] CPU 7: hi: 186, btch: 31 usd: 30
Jan 23 11:52:53 mongonode1 kernel: [498032.932523] Node 0 Normal per-cpu:
Jan 23 11:52:53 mongonode1 kernel: [498032.932525] CPU 0: hi: 186, btch: 31 usd: 26
Jan 23 11:52:53 mongonode1 kernel: [498032.932526] CPU 1: hi: 186, btch: 31 usd: 81
Jan 23 11:52:53 mongonode1 kernel: [498032.932528] CPU 2: hi: 186, btch: 31 usd: 88
Jan 23 11:52:53 mongonode1 kernel: [498032.932529] CPU 3: hi: 186, btch: 31 usd: 30
Jan 23 11:52:53 mongonode1 kernel: [498032.932530] CPU 4: hi: 186, btch: 31 usd: 39
Jan 23 11:52:53 mongonode1 kernel: [498032.932532] CPU 5: hi: 186, btch: 31 usd: 13
Jan 23 11:52:53 mongonode1 kernel: [498032.932533] CPU 6: hi: 186, btch: 31 usd: 60
Jan 23 11:52:53 mongonode1 kernel: [498032.932535] CPU 7: hi: 186, btch: 31 usd: 137
Jan 23 11:52:53 mongonode1 kernel: [498032.932538] active_anon:3543847 inactive_anon:421374 isolated_anon:0
Jan 23 11:52:53 mongonode1 kernel: [498032.932539] active_file:267 inactive_file:555 isolated_file:0
Jan 23 11:52:53 mongonode1 kernel: [498032.932540] unevictable:0 dirty:3 writeback:128 unstable:0
Jan 23 11:52:53 mongonode1 kernel: [498032.932541] free:33649 slab_reclaimable:5765 slab_unreclaimable:6544
Jan 23 11:52:53 mongonode1 kernel: [498032.932541] mapped:378 shmem:5 pagetables:39476 bounce:0
Jan 23 11:52:53 mongonode1 kernel: [498032.932543] Node 0 DMA free:15892kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15652kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 23 11:52:53 mongonode1 kernel: [498032.932550] lowmem_reserve[]: 0 3246 16098 16098
Jan 23 11:52:53 mongonode1 kernel: [498032.932553] Node 0 DMA32 free:64996kB min:13616kB low:17020kB high:20424kB active_anon:2668776kB inactive_anon:534236kB active_file:464kB inactive_file:300kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3324872kB mlocked:0kB dirty:0kB writeback:52kB mapped:388kB shmem:0kB slab_reclaimable:4484kB slab_unreclaimable:3232kB kernel_stack:1096kB pagetables:30744kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2255 all_unreclaimable? yes
Jan 23 11:52:53 mongonode1 kernel: [498032.932561] lowmem_reserve[]: 0 0 12852 12852
Jan 23 11:52:53 mongonode1 kernel: [498032.932564] Node 0 Normal free:53708kB min:53900kB low:67372kB high:80848kB active_anon:11506612kB inactive_anon:1151260kB active_file:604kB inactive_file:1920kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13160448kB mlocked:0kB dirty:12kB writeback:460kB mapped:1124kB shmem:20kB slab_reclaimable:18576kB slab_unreclaimable:22928kB kernel_stack:3008kB pagetables:127160kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:5549 all_unreclaimable? yes
Jan 23 11:52:53 mongonode1 kernel: [498032.932571] lowmem_reserve[]: 0 0 0 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932574] Node 0 DMA: 1*4kB 0*8kB 1*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15892kB
Jan 23 11:52:53 mongonode1 kernel: [498032.932581] Node 0 DMA32: 583*4kB 1388*8kB 188*16kB 270*32kB 167*64kB 85*128kB 40*256kB 9*512kB 2*1024kB 1*2048kB 0*4096kB = 65596kB
Jan 23 11:52:53 mongonode1 kernel: [498032.932588] Node 0 Normal: 12355*4kB 7*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 53588kB
Jan 23 11:52:53 mongonode1 kernel: [498032.932595] 4610 total pagecache pages
Jan 23 11:52:53 mongonode1 kernel: [498032.932596] 3539 pages in swap cache
Jan 23 11:52:53 mongonode1 kernel: [498032.932597] Swap cache stats: add 6003912, delete 6000373, find 289681/334479
Jan 23 11:52:53 mongonode1 kernel: [498032.932599] Free swap = 0kB
Jan 23 11:52:53 mongonode1 kernel: [498032.932600] Total swap = 16774140kB
Jan 23 11:52:53 mongonode1 kernel: [498033.009125] 4194288 pages RAM
Jan 23 11:52:53 mongonode1 kernel: [498033.009129] 85746 pages reserved
Jan 23 11:52:53 mongonode1 kernel: [498033.009130] 5605 pages shared
Jan 23 11:52:53 mongonode1 kernel: [498033.009131] 4072393 pages non-shared
Jan 23 11:52:53 mongonode1 kernel: [498033.009132] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
Jan 23 11:52:53 mongonode1 kernel: [498033.009149] [ 567] 0 567 3797 0 5 0 0 upstart-socket-
Jan 23 11:52:53 mongonode1 kernel: [498033.009153] [ 709] 0 709 12489 0 7 -17 -1000 sshd
Jan 23 11:52:53 mongonode1 kernel: [498033.009156] [ 732] 101 732 82871 66 3 0 0 rsyslogd
Jan 23 11:52:53 mongonode1 kernel: [498033.009159] [ 767] 102 767 5987 0 2 0 0 dbus-daemon
Jan 23 11:52:53 mongonode1 kernel: [498033.009163] [ 904] 0 904 3946 1 4 0 0 getty
Jan 23 11:52:53 mongonode1 kernel: [498033.009166] [ 911] 0 911 3946 1 0 0 0 getty
Jan 23 11:52:53 mongonode1 kernel: [498033.009169] [ 946] 0 946 3946 1 1 0 0 getty
Jan 23 11:52:53 mongonode1 kernel: [498033.009172] [ 947] 0 947 3946 1 0 0 0 getty
Jan 23 11:52:53 mongonode1 kernel: [498033.009174] [ 949] 0 949 3946 1 1 0 0 getty
Jan 23 11:52:53 mongonode1 kernel: [498033.009177] [ 955] 0 955 1082 0 0 0 0 acpid
Jan 23 11:52:53 mongonode1 kernel: [498033.009180] [ 957] 0 957 4227 4 0 0 0 atd
Jan 23 11:52:53 mongonode1 kernel: [498033.009200] [ 958] 0 958 4778 21 3 0 0 cron
Jan 23 11:52:53 mongonode1 kernel: [498033.009203] [ 991] 0 991 3995 23 3 0 0 irqbalance
Jan 23 11:52:53 mongonode1 kernel: [498033.009222] [ 1046] 103 1046 46894 0 2 0 0 whoopsie
Jan 23 11:52:53 mongonode1 kernel: [498033.009242] [ 1090] 0 1090 35109 354 3 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009245] [ 1124] 0 1124 3946 1 0 0 0 getty
Jan 23 11:52:53 mongonode1 kernel: [498033.009248] [ 1163] 0 1163 18340 0 2 0 0 sshd
Jan 23 11:52:53 mongonode1 kernel: [498033.009251] [ 1295] 1000 1295 18340 33 3 0 0 sshd
Jan 23 11:52:53 mongonode1 kernel: [498033.009254] [ 1296] 1000 1296 6578 171 4 0 0 bash
Jan 23 11:52:53 mongonode1 kernel: [498033.009274] [16910] 0 16910 4308 0 5 0 0 upstart-udev-br
Jan 23 11:52:53 mongonode1 kernel: [498033.009277] [16912] 0 16912 5334 1 5 -17 -1000 udevd
Jan 23 11:52:53 mongonode1 kernel: [498033.009282] [ 5636] 0 5636 5333 1 2 -17 -1000 udevd
Jan 23 11:52:53 mongonode1 kernel: [498033.009304] [ 5667] 0 5667 5333 0 4 -17 -1000 udevd
Jan 23 11:52:53 mongonode1 kernel: [498033.009307] [17769] 1001 17769 106264 3760 1 0 0 mongod
Jan 23 11:52:53 mongonode1 kernel: [498033.009310] [17787] 1001 17787 60093 735 6 0 0 mongos
Jan 23 11:52:53 mongonode1 kernel: [498033.009313] [23506] 65534 23506 120763 221 2 0 0 gmetad
Jan 23 11:52:53 mongonode1 kernel: [498033.009316] [23527] 999 23527 13224 641 2 0 0 gmond
Jan 23 11:52:53 mongonode1 kernel: [498033.009319] [22439] 1001 22439 48999141 3949504 3 0 0 mongod
Jan 23 11:52:53 mongonode1 kernel: [498033.009322] [ 5076] 33 5076 35414 712 6 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009325] [ 5157] 33 5157 35525 885 6 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009328] [ 5204] 33 5204 35346 642 6 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009331] [ 5321] 33 5321 35384 748 2 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009334] [ 5568] 33 5568 35458 783 6 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009353] [ 6073] 33 6073 35382 749 2 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009356] [ 6191] 33 6191 35382 735 6 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009359] [ 6341] 33 6341 35346 645 1 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009363] [ 6390] 33 6390 35426 717 6 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009366] [ 6421] 33 6421 35339 569 7 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009369] [ 6459] 33 6459 35121 385 1 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009372] [ 6460] 33 6460 1100 25 3 0 0 sh
Jan 23 11:52:53 mongonode1 kernel: [498033.009375] [ 6461] 33 6461 1100 24 0 0 0 sh
Jan 23 11:52:53 mongonode1 kernel: [498033.009378] [ 6462] 33 6462 1100 25 7 0 0 sh
Jan 23 11:52:53 mongonode1 kernel: [498033.009381] [ 6463] 33 6463 1100 25 3 0 0 sh
Jan 23 11:52:53 mongonode1 kernel: [498033.009383] [ 6464] 33 6464 1100 25 5 0 0 sh
Jan 23 11:52:53 mongonode1 kernel: [498033.009386] [ 6465] 33 6465 1100 25 4 0 0 sh
Jan 23 11:52:53 mongonode1 kernel: [498033.009389] [ 6466] 33 6466 18196 276 6 0 0 rrdtool
Jan 23 11:52:53 mongonode1 kernel: [498033.009392] [ 6467] 33 6467 18196 280 3 0 0 rrdtool
Jan 23 11:52:53 mongonode1 kernel: [498033.009395] [ 6468] 33 6468 18196 276 5 0 0 rrdtool
Jan 23 11:52:53 mongonode1 kernel: [498033.009398] [ 6469] 33 6469 18226 282 1 0 0 rrdtool
Jan 23 11:52:53 mongonode1 kernel: [498033.009400] [ 6470] 33 6470 18226 282 6 0 0 rrdtool
Jan 23 11:52:53 mongonode1 kernel: [498033.009403] [ 6471] 33 6471 18226 283 3 0 0 rrdtool
Jan 23 11:52:53 mongonode1 kernel: [498033.009406] Out of memory: Kill process 22439 (mongod) score 982 or sacrifice child
Jan 23 11:52:53 mongonode1 kernel: [498033.009600] Killed process 22439 (mongod) total-vm:195996564kB, anon-rss:15798016kB, file-rss:0kB

My ulimit settings:

mongo@mongonode1:~$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 128254
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 63000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 128254
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Comment by Ben Becker [ 23/Jan/13 ]

Hi Ian,

Would it be possible to attach a copy of your MapReduce script? Also, could you clarify what you mean by silently dies?

Comment by Ian Beaver [ 23/Jan/13 ]

I am seeing what I think is this issue in 2.3.2 as well. When running map/reduce jobs that write out to a collection and then drop it later the memory never seems to be freed. In 2.3.1 the primary shard would quickly march through all memory, swap and then segfault. In 2.3.2 it marches through all memory and swap, slows to a crawl for a while and then silently dies.

I have more info on the mongodb-user group: https://groups.google.com/forum/#!topic/mongodb-user/mL3K5JbAa1Q

Comment by Scott Hernandez (Inactive) [ 10/Jan/13 ]

Probably best to get it tested in 2.3.2 when it comes out since it is a development release. So much has changed.

Generated at Thu Feb 08 03:16:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.