|
I cant attach the actual code as its part of a propriety code base, but I will try to reproduce it with a test script. I was originally thinking that the mongod on the primary shard was just dying without an error since it used to segfault in 2.3.1, but now I see in the syslog the OOM killer killed it.
Jan 23 11:52:53 mongonode1 kernel: [498032.932403] mongod invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Jan 23 11:52:53 mongonode1 kernel: [498032.932409] mongod cpuset=/ mems_allowed=0
Jan 23 11:52:53 mongonode1 kernel: [498032.932411] Pid: 22447, comm: mongod Not tainted 3.2.0-35-generic #55-Ubuntu
Jan 23 11:52:53 mongonode1 kernel: [498032.932413] Call Trace:
Jan 23 11:52:53 mongonode1 kernel: [498032.932423] [<ffffffff810c0e7d>] ? cpuset_print_task_mems_allowed+0x9d/0xb0
Jan 23 11:52:53 mongonode1 kernel: [498032.932428] [<ffffffff8111bdc1>] dump_header+0x91/0xe0
Jan 23 11:52:53 mongonode1 kernel: [498032.932430] [<ffffffff8111c145>] oom_kill_process+0x85/0xb0
Jan 23 11:52:53 mongonode1 kernel: [498032.932432] [<ffffffff8111c4ea>] out_of_memory+0xfa/0x220
Jan 23 11:52:53 mongonode1 kernel: [498032.932435] [<ffffffff81121eec>] __alloc_pages_nodemask+0x8dc/0x8f0
Jan 23 11:52:53 mongonode1 kernel: [498032.932440] [<ffffffff81158f26>] alloc_pages_current+0xb6/0x120
Jan 23 11:52:53 mongonode1 kernel: [498032.932443] [<ffffffff81118a47>] __page_cache_alloc+0xb7/0xd0
Jan 23 11:52:53 mongonode1 kernel: [498032.932446] [<ffffffff8111aca4>] filemap_fault+0x234/0x3e0
Jan 23 11:52:53 mongonode1 kernel: [498032.932463] [<ffffffff8116f5cb>] ? mem_cgroup_update_page_stat+0x2b/0x110
Jan 23 11:52:53 mongonode1 kernel: [498032.932466] [<ffffffff8113b332>] __do_fault+0x72/0x550
Jan 23 11:52:53 mongonode1 kernel: [498032.932469] [<ffffffff8165b0cf>] ? schedule+0x3f/0x60
Jan 23 11:52:53 mongonode1 kernel: [498032.932472] [<ffffffff8113e9ca>] handle_pte_fault+0xfa/0x200
Jan 23 11:52:53 mongonode1 kernel: [498032.932474] [<ffffffff8113ee88>] handle_mm_fault+0x1f8/0x350
Jan 23 11:52:53 mongonode1 kernel: [498032.932478] [<ffffffff81660ad0>] do_page_fault+0x150/0x520
Jan 23 11:52:53 mongonode1 kernel: [498032.932480] [<ffffffff8165bff4>] ? do_nanosleep+0xa4/0xd0
Jan 23 11:52:53 mongonode1 kernel: [498032.932484] [<ffffffff81090708>] ? hrtimer_nanosleep+0xb8/0x180
Jan 23 11:52:53 mongonode1 kernel: [498032.932487] [<ffffffff8108f2d0>] ? update_rmtp+0x70/0x70
Jan 23 11:52:53 mongonode1 kernel: [498032.932490] [<ffffffff810900c4>] ? hrtimer_start_range_ns+0x14/0x20
Jan 23 11:52:53 mongonode1 kernel: [498032.932492] [<ffffffff8165d735>] page_fault+0x25/0x30
Jan 23 11:52:53 mongonode1 kernel: [498032.932494] Mem-Info:
Jan 23 11:52:53 mongonode1 kernel: [498032.932495] Node 0 DMA per-cpu:
Jan 23 11:52:53 mongonode1 kernel: [498032.932498] CPU 0: hi: 0, btch: 1 usd: 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932499] CPU 1: hi: 0, btch: 1 usd: 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932501] CPU 2: hi: 0, btch: 1 usd: 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932502] CPU 3: hi: 0, btch: 1 usd: 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932504] CPU 4: hi: 0, btch: 1 usd: 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932505] CPU 5: hi: 0, btch: 1 usd: 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932507] CPU 6: hi: 0, btch: 1 usd: 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932508] CPU 7: hi: 0, btch: 1 usd: 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932509] Node 0 DMA32 per-cpu:
Jan 23 11:52:53 mongonode1 kernel: [498032.932511] CPU 0: hi: 186, btch: 31 usd: 37
Jan 23 11:52:53 mongonode1 kernel: [498032.932513] CPU 1: hi: 186, btch: 31 usd: 22
Jan 23 11:52:53 mongonode1 kernel: [498032.932514] CPU 2: hi: 186, btch: 31 usd: 20
Jan 23 11:52:53 mongonode1 kernel: [498032.932516] CPU 3: hi: 186, btch: 31 usd: 14
Jan 23 11:52:53 mongonode1 kernel: [498032.932517] CPU 4: hi: 186, btch: 31 usd: 15
Jan 23 11:52:53 mongonode1 kernel: [498032.932519] CPU 5: hi: 186, btch: 31 usd: 13
Jan 23 11:52:53 mongonode1 kernel: [498032.932520] CPU 6: hi: 186, btch: 31 usd: 11
Jan 23 11:52:53 mongonode1 kernel: [498032.932522] CPU 7: hi: 186, btch: 31 usd: 30
Jan 23 11:52:53 mongonode1 kernel: [498032.932523] Node 0 Normal per-cpu:
Jan 23 11:52:53 mongonode1 kernel: [498032.932525] CPU 0: hi: 186, btch: 31 usd: 26
Jan 23 11:52:53 mongonode1 kernel: [498032.932526] CPU 1: hi: 186, btch: 31 usd: 81
Jan 23 11:52:53 mongonode1 kernel: [498032.932528] CPU 2: hi: 186, btch: 31 usd: 88
Jan 23 11:52:53 mongonode1 kernel: [498032.932529] CPU 3: hi: 186, btch: 31 usd: 30
Jan 23 11:52:53 mongonode1 kernel: [498032.932530] CPU 4: hi: 186, btch: 31 usd: 39
Jan 23 11:52:53 mongonode1 kernel: [498032.932532] CPU 5: hi: 186, btch: 31 usd: 13
Jan 23 11:52:53 mongonode1 kernel: [498032.932533] CPU 6: hi: 186, btch: 31 usd: 60
Jan 23 11:52:53 mongonode1 kernel: [498032.932535] CPU 7: hi: 186, btch: 31 usd: 137
Jan 23 11:52:53 mongonode1 kernel: [498032.932538] active_anon:3543847 inactive_anon:421374 isolated_anon:0
Jan 23 11:52:53 mongonode1 kernel: [498032.932539] active_file:267 inactive_file:555 isolated_file:0
Jan 23 11:52:53 mongonode1 kernel: [498032.932540] unevictable:0 dirty:3 writeback:128 unstable:0
Jan 23 11:52:53 mongonode1 kernel: [498032.932541] free:33649 slab_reclaimable:5765 slab_unreclaimable:6544
Jan 23 11:52:53 mongonode1 kernel: [498032.932541] mapped:378 shmem:5 pagetables:39476 bounce:0
Jan 23 11:52:53 mongonode1 kernel: [498032.932543] Node 0 DMA free:15892kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15652kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 23 11:52:53 mongonode1 kernel: [498032.932550] lowmem_reserve[]: 0 3246 16098 16098
Jan 23 11:52:53 mongonode1 kernel: [498032.932553] Node 0 DMA32 free:64996kB min:13616kB low:17020kB high:20424kB active_anon:2668776kB inactive_anon:534236kB active_file:464kB inactive_file:300kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3324872kB mlocked:0kB dirty:0kB writeback:52kB mapped:388kB shmem:0kB slab_reclaimable:4484kB slab_unreclaimable:3232kB kernel_stack:1096kB pagetables:30744kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2255 all_unreclaimable? yes
Jan 23 11:52:53 mongonode1 kernel: [498032.932561] lowmem_reserve[]: 0 0 12852 12852
Jan 23 11:52:53 mongonode1 kernel: [498032.932564] Node 0 Normal free:53708kB min:53900kB low:67372kB high:80848kB active_anon:11506612kB inactive_anon:1151260kB active_file:604kB inactive_file:1920kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13160448kB mlocked:0kB dirty:12kB writeback:460kB mapped:1124kB shmem:20kB slab_reclaimable:18576kB slab_unreclaimable:22928kB kernel_stack:3008kB pagetables:127160kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:5549 all_unreclaimable? yes
Jan 23 11:52:53 mongonode1 kernel: [498032.932571] lowmem_reserve[]: 0 0 0 0
Jan 23 11:52:53 mongonode1 kernel: [498032.932574] Node 0 DMA: 1*4kB 0*8kB 1*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15892kB
Jan 23 11:52:53 mongonode1 kernel: [498032.932581] Node 0 DMA32: 583*4kB 1388*8kB 188*16kB 270*32kB 167*64kB 85*128kB 40*256kB 9*512kB 2*1024kB 1*2048kB 0*4096kB = 65596kB
Jan 23 11:52:53 mongonode1 kernel: [498032.932588] Node 0 Normal: 12355*4kB 7*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 53588kB
Jan 23 11:52:53 mongonode1 kernel: [498032.932595] 4610 total pagecache pages
Jan 23 11:52:53 mongonode1 kernel: [498032.932596] 3539 pages in swap cache
Jan 23 11:52:53 mongonode1 kernel: [498032.932597] Swap cache stats: add 6003912, delete 6000373, find 289681/334479
Jan 23 11:52:53 mongonode1 kernel: [498032.932599] Free swap = 0kB
Jan 23 11:52:53 mongonode1 kernel: [498032.932600] Total swap = 16774140kB
Jan 23 11:52:53 mongonode1 kernel: [498033.009125] 4194288 pages RAM
Jan 23 11:52:53 mongonode1 kernel: [498033.009129] 85746 pages reserved
Jan 23 11:52:53 mongonode1 kernel: [498033.009130] 5605 pages shared
Jan 23 11:52:53 mongonode1 kernel: [498033.009131] 4072393 pages non-shared
Jan 23 11:52:53 mongonode1 kernel: [498033.009132] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
Jan 23 11:52:53 mongonode1 kernel: [498033.009149] [ 567] 0 567 3797 0 5 0 0 upstart-socket-
Jan 23 11:52:53 mongonode1 kernel: [498033.009153] [ 709] 0 709 12489 0 7 -17 -1000 sshd
Jan 23 11:52:53 mongonode1 kernel: [498033.009156] [ 732] 101 732 82871 66 3 0 0 rsyslogd
Jan 23 11:52:53 mongonode1 kernel: [498033.009159] [ 767] 102 767 5987 0 2 0 0 dbus-daemon
Jan 23 11:52:53 mongonode1 kernel: [498033.009163] [ 904] 0 904 3946 1 4 0 0 getty
Jan 23 11:52:53 mongonode1 kernel: [498033.009166] [ 911] 0 911 3946 1 0 0 0 getty
Jan 23 11:52:53 mongonode1 kernel: [498033.009169] [ 946] 0 946 3946 1 1 0 0 getty
Jan 23 11:52:53 mongonode1 kernel: [498033.009172] [ 947] 0 947 3946 1 0 0 0 getty
Jan 23 11:52:53 mongonode1 kernel: [498033.009174] [ 949] 0 949 3946 1 1 0 0 getty
Jan 23 11:52:53 mongonode1 kernel: [498033.009177] [ 955] 0 955 1082 0 0 0 0 acpid
Jan 23 11:52:53 mongonode1 kernel: [498033.009180] [ 957] 0 957 4227 4 0 0 0 atd
Jan 23 11:52:53 mongonode1 kernel: [498033.009200] [ 958] 0 958 4778 21 3 0 0 cron
Jan 23 11:52:53 mongonode1 kernel: [498033.009203] [ 991] 0 991 3995 23 3 0 0 irqbalance
Jan 23 11:52:53 mongonode1 kernel: [498033.009222] [ 1046] 103 1046 46894 0 2 0 0 whoopsie
Jan 23 11:52:53 mongonode1 kernel: [498033.009242] [ 1090] 0 1090 35109 354 3 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009245] [ 1124] 0 1124 3946 1 0 0 0 getty
Jan 23 11:52:53 mongonode1 kernel: [498033.009248] [ 1163] 0 1163 18340 0 2 0 0 sshd
Jan 23 11:52:53 mongonode1 kernel: [498033.009251] [ 1295] 1000 1295 18340 33 3 0 0 sshd
Jan 23 11:52:53 mongonode1 kernel: [498033.009254] [ 1296] 1000 1296 6578 171 4 0 0 bash
Jan 23 11:52:53 mongonode1 kernel: [498033.009274] [16910] 0 16910 4308 0 5 0 0 upstart-udev-br
Jan 23 11:52:53 mongonode1 kernel: [498033.009277] [16912] 0 16912 5334 1 5 -17 -1000 udevd
Jan 23 11:52:53 mongonode1 kernel: [498033.009282] [ 5636] 0 5636 5333 1 2 -17 -1000 udevd
Jan 23 11:52:53 mongonode1 kernel: [498033.009304] [ 5667] 0 5667 5333 0 4 -17 -1000 udevd
Jan 23 11:52:53 mongonode1 kernel: [498033.009307] [17769] 1001 17769 106264 3760 1 0 0 mongod
Jan 23 11:52:53 mongonode1 kernel: [498033.009310] [17787] 1001 17787 60093 735 6 0 0 mongos
Jan 23 11:52:53 mongonode1 kernel: [498033.009313] [23506] 65534 23506 120763 221 2 0 0 gmetad
Jan 23 11:52:53 mongonode1 kernel: [498033.009316] [23527] 999 23527 13224 641 2 0 0 gmond
Jan 23 11:52:53 mongonode1 kernel: [498033.009319] [22439] 1001 22439 48999141 3949504 3 0 0 mongod
Jan 23 11:52:53 mongonode1 kernel: [498033.009322] [ 5076] 33 5076 35414 712 6 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009325] [ 5157] 33 5157 35525 885 6 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009328] [ 5204] 33 5204 35346 642 6 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009331] [ 5321] 33 5321 35384 748 2 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009334] [ 5568] 33 5568 35458 783 6 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009353] [ 6073] 33 6073 35382 749 2 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009356] [ 6191] 33 6191 35382 735 6 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009359] [ 6341] 33 6341 35346 645 1 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009363] [ 6390] 33 6390 35426 717 6 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009366] [ 6421] 33 6421 35339 569 7 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009369] [ 6459] 33 6459 35121 385 1 0 0 apache2
Jan 23 11:52:53 mongonode1 kernel: [498033.009372] [ 6460] 33 6460 1100 25 3 0 0 sh
Jan 23 11:52:53 mongonode1 kernel: [498033.009375] [ 6461] 33 6461 1100 24 0 0 0 sh
Jan 23 11:52:53 mongonode1 kernel: [498033.009378] [ 6462] 33 6462 1100 25 7 0 0 sh
Jan 23 11:52:53 mongonode1 kernel: [498033.009381] [ 6463] 33 6463 1100 25 3 0 0 sh
Jan 23 11:52:53 mongonode1 kernel: [498033.009383] [ 6464] 33 6464 1100 25 5 0 0 sh
Jan 23 11:52:53 mongonode1 kernel: [498033.009386] [ 6465] 33 6465 1100 25 4 0 0 sh
Jan 23 11:52:53 mongonode1 kernel: [498033.009389] [ 6466] 33 6466 18196 276 6 0 0 rrdtool
Jan 23 11:52:53 mongonode1 kernel: [498033.009392] [ 6467] 33 6467 18196 280 3 0 0 rrdtool
Jan 23 11:52:53 mongonode1 kernel: [498033.009395] [ 6468] 33 6468 18196 276 5 0 0 rrdtool
Jan 23 11:52:53 mongonode1 kernel: [498033.009398] [ 6469] 33 6469 18226 282 1 0 0 rrdtool
Jan 23 11:52:53 mongonode1 kernel: [498033.009400] [ 6470] 33 6470 18226 282 6 0 0 rrdtool
Jan 23 11:52:53 mongonode1 kernel: [498033.009403] [ 6471] 33 6471 18226 283 3 0 0 rrdtool
Jan 23 11:52:53 mongonode1 kernel: [498033.009406] Out of memory: Kill process 22439 (mongod) score 982 or sacrifice child
Jan 23 11:52:53 mongonode1 kernel: [498033.009600] Killed process 22439 (mongod) total-vm:195996564kB, anon-rss:15798016kB, file-rss:0kB
My ulimit settings:
mongo@mongonode1:~$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 128254
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 63000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 128254
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
|