[SERVER-6141] can't successfully replicate our shards anymore. replication isn't using memory efficiently and linux is invoking oom_killer to kill mongod. servers replicated earlier on same config (with smaller data sets) are still working fine... Created: 20/Jun/12  Updated: 09/Jul/16  Resolved: 15/Dec/12

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.6
Fix Version/s: None

Type: Bug Priority: Blocker - P1
Reporter: John Albietz Assignee: Andy Schwerin
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

centos 6


Issue Links:
Duplicate
duplicates SERVER-6414 use regular file io, not mmap for ext... Closed
Related
related to SERVER-5312 Resync fails due to out of memory Closed
Operating System: Linux
Participants:

 Description   

We have an issue with replication that is preventing us from successfully adding any new nodes to the replica sets in our main mongo sharded (4 shards) cluster... the team here is evaluating moving to a different db platform but I'm hoping that there's a patch or workaround that will allow us to continue growing the cluster without needing to boot up new boxes with double the RAM just to successfully replicate.

From a new box (dual Core 8gb vm instance on Joyentcloud running centos 6), we installed the latest mongo 2.0.6 and started replication for the shard... the external sort finished but then mongo eventually crashed... here's the end of the mongo log from /var/log/mongo/mongod.log .

(Note how fast it was going initially and then how slow it got near the end. took 23 minutes to get to 80% and then 3 hours or so to get from 80% to 89% before it crashed.)

Wed Jun 20 02:04:51 [rsSync] external sort used : 113 files in 728 secs
750000/111903891 0%
1497600/111903891 1%
2312200/111903891 2%
3064100/111903891 2%
3844600/111903891 3%
4562100/111903891 4%
5265400/111903891 4%
5999500/111903891 5%
6776200/111903891 6%
7537600/111903891 6%
8248300/111903891 7%
9042000/111903891 8%
9840000/111903891 8%
10581600/111903891 9%
11316200/111903891 10%
12021600/111903891 10%
12783200/111903891 11%
13545800/111903891 12%
14290000/111903891 12%
15025900/111903891 13%
15714800/111903891 14%
16569400/111903891 14%
17312500/111903891 15%
18118200/111903891 16%
19574500/111903891 17%
20338300/111903891 18%
20966200/111903891 18%
21679500/111903891 19%
22386000/111903891 20%
Wed Jun 20 02:09:54 [FileAllocator] allocating new datafile /data/mongo/chartboost.23, filling with zeroes...
Wed Jun 20 02:09:54 [FileAllocator] done allocating datafile /data/mongo/chartboost.23, size: 2047MB, took 0.022 secs
22987800/111903891 20%
23791400/111903891 21%
24548300/111903891 21%
25323700/111903891 22%
26097700/111903891 23%
26826700/111903891 23%
27597400/111903891 24%
28278800/111903891 25%
28989300/111903891 25%
29672100/111903891 26%
30367600/111903891 27%
31128000/111903891 27%
31857600/111903891 28%
32678400/111903891 29%
33371000/111903891 29%
34081400/111903891 30%
34828300/111903891 31%
35520100/111903891 31%
36235000/111903891 32%
37028900/111903891 33%
37828200/111903891 33%
38508000/111903891 34%
39263700/111903891 35%
40103300/111903891 35%
Wed Jun 20 02:13:53 [conn2] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6f27fb0b7403c30c7021') }

ntoreturn:1 reslen:44 300000ms
40792200/111903891 36%
Wed Jun 20 02:14:02 [conn22] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6ee0c637de224e7ecb5d') }

ntoreturn:1 reslen:44 300000ms
Wed Jun 20 02:14:03 [conn23] command admin.$cmd command:

{ writebacklisten: ObjectId('4fbc4fe47fc06f85f192440a') }

ntoreturn:1 reslen:44 300000ms
41561700/111903891 37%
42327600/111903891 37%
Wed Jun 20 02:14:26 [rsSync] old journal file will be removed: /data/mongo/journal/j._19
43119100/111903891 38%
43918300/111903891 39%
44654100/111903891 39%
45450900/111903891 40%
46157500/111903891 41%
46893000/111903891 41%
Wed Jun 20 02:15:25 [FileAllocator] allocating new datafile /data/mongo/chartboost.24, filling with zeroes...
Wed Jun 20 02:15:25 [FileAllocator] done allocating datafile /data/mongo/chartboost.24, size: 2047MB, took 0.017 secs
47653800/111903891 42%
48332700/111903891 43%
48959600/111903891 43%
49679300/111903891 44%
50481100/111903891 45%
51233700/111903891 45%
51964700/111903891 46%
52710700/111903891 47%
53529000/111903891 47%
54276100/111903891 48%
54958500/111903891 49%
55740700/111903891 49%
56435700/111903891 50%
57150400/111903891 51%
57891900/111903891 51%
58603700/111903891 52%
59331000/111903891 53%
59991700/111903891 53%
60643500/111903891 54%
61331300/111903891 54%
61919100/111903891 55%
Wed Jun 20 02:18:53 [conn2] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6f27fb0b7403c30c7021') }

ntoreturn:1 reslen:44 300001ms
62555700/111903891 55%
Wed Jun 20 02:19:02 [conn22] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6ee0c637de224e7ecb5d') }

ntoreturn:1 reslen:44 300000ms
Wed Jun 20 02:19:03 [conn23] command admin.$cmd command:

{ writebacklisten: ObjectId('4fbc4fe47fc06f85f192440a') }

ntoreturn:1 reslen:44 300000ms
63141000/111903891 56%
63762200/111903891 56%
64425700/111903891 57%
65007000/111903891 58%
Wed Jun 20 02:19:44 [FileAllocator] allocating new datafile /data/mongo/chartboost.25, filling with zeroes...
Wed Jun 20 02:19:44 [FileAllocator] done allocating datafile /data/mongo/chartboost.25, size: 2047MB, took 0.039 secs
65603700/111903891 58%
66170700/111903891 59%
66768000/111903891 59%
67311700/111903891 60%
67961300/111903891 60%
68572100/111903891 61%
69221800/111903891 61%
69678300/111903891 62%
70327700/111903891 62%
70968700/111903891 63%
71560600/111903891 63%
72199600/111903891 64%
72809500/111903891 65%
Wed Jun 20 02:21:57 [rsSync] old journal file will be removed: /data/mongo/journal/j._20
Wed Jun 20 02:21:57 [rsSync] old journal file will be removed: /data/mongo/journal/j._21
73429700/111903891 65%
74121000/111903891 66%
74753000/111903891 66%
75357200/111903891 67%
75864200/111903891 67%
76519400/111903891 68%
77118800/111903891 68%
77778800/111903891 69%
78456300/111903891 70%
79073300/111903891 70%
79705400/111903891 71%
80272500/111903891 71%
Wed Jun 20 02:23:53 [conn2] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6f27fb0b7403c30c7021') }

ntoreturn:1 reslen:44 300000ms
80812700/111903891 72%
Wed Jun 20 02:24:02 [conn22] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6ee0c637de224e7ecb5d') }

ntoreturn:1 reslen:44 300000ms
Wed Jun 20 02:24:03 [conn23] command admin.$cmd command:

{ writebacklisten: ObjectId('4fbc4fe47fc06f85f192440a') }

ntoreturn:1 reslen:44 300000ms
81429300/111903891 72%
82046700/111903891 73%
82644800/111903891 73%
Wed Jun 20 02:24:31 [FileAllocator] allocating new datafile /data/mongo/chartboost.26, filling with zeroes...
Wed Jun 20 02:24:31 [FileAllocator] done allocating datafile /data/mongo/chartboost.26, size: 2047MB, took 0.036 secs
83297000/111903891 74%
83904100/111903891 74%
84475200/111903891 75%
84989600/111903891 75%
85581700/111903891 76%
86154700/111903891 76%
86727400/111903891 77%
87234200/111903891 77%
87742600/111903891 78%
88299400/111903891 78%
88826000/111903891 79%
89226400/111903891 79%
89829800/111903891 80%
90382700/111903891 80%
Wed Jun 20 02:26:53 [PeriodicTask::Runner] task: WriteBackManager::cleaner took: 8ms
90954000/111903891 81%
91552200/111903891 81%
92062900/111903891 82%
92621600/111903891 82%
93130900/111903891 83%
93660300/111903891 83%
Wed Jun 20 02:27:58 [FileAllocator] allocating new datafile /data/mongo/chartboost.27, filling with zeroes...
Wed Jun 20 02:27:58 [FileAllocator] done allocating datafile /data/mongo/chartboost.27, size: 2047MB, took 0.022 secs
94167700/111903891 84%
94625700/111903891 84%
95039400/111903891 84%
95456400/111903891 85%
95830800/111903891 85%
96166400/111903891 85%
Wed Jun 20 02:28:53 [conn2] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6f27fb0b7403c30c7021') }

ntoreturn:1 reslen:44 300024ms
Wed Jun 20 02:29:00 [rsSync] old journal file will be removed: /data/mongo/journal/j._22
96491100/111903891 86%
Wed Jun 20 02:29:02 [conn22] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6ee0c637de224e7ecb5d') }

ntoreturn:1 reslen:44 300001ms
Wed Jun 20 02:29:03 [conn23] command admin.$cmd command:

{ writebacklisten: ObjectId('4fbc4fe47fc06f85f192440a') }

ntoreturn:1 reslen:44 300001ms
96751100/111903891 86%
97008200/111903891 86%
97280500/111903891 86%
97498000/111903891 87%
97898500/111903891 87%
98290900/111903891 87%
98304600/111903891 87%
98315900/111903891 87%
98318400/111903891 87%
98318900/111903891 87%
98319500/111903891 87%
98361100/111903891 87%
98374600/111903891 87%
98377200/111903891 87%
98385200/111903891 87%
98390700/111903891 87%
98404000/111903891 87%
98424800/111903891 87%
98426800/111903891 87%
98428300/111903891 87%
98430400/111903891 87%
98430900/111903891 87%
98431400/111903891 87%
98432500/111903891 87%
98435500/111903891 87%
98436500/111903891 87%
98438800/111903891 87%
98439900/111903891 87%
Wed Jun 20 02:33:53 [conn2] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6f27fb0b7403c30c7021') }

ntoreturn:1 reslen:44 300001ms
98441300/111903891 87%
Wed Jun 20 02:34:02 [conn22] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6ee0c637de224e7ecb5d') }

ntoreturn:1 reslen:44 300001ms
Wed Jun 20 02:34:03 [conn23] command admin.$cmd command:

{ writebacklisten: ObjectId('4fbc4fe47fc06f85f192440a') }

ntoreturn:1 reslen:44 300001ms
98441800/111903891 87%
98443100/111903891 87%
98444400/111903891 87%
98445000/111903891 87%
98446300/111903891 87%
98448400/111903891 87%
98451300/111903891 87%
98455600/111903891 87%
98458100/111903891 87%
98459000/111903891 87%
98461100/111903891 87%
98461800/111903891 87%
98462500/111903891 87%
98463900/111903891 87%
98464700/111903891 87%
98465700/111903891 87%
98466700/111903891 87%
98467200/111903891 87%
98467700/111903891 87%
98468200/111903891 87%
98468700/111903891 87%
98469800/111903891 87%
98470300/111903891 87%
98470700/111903891 87%
98471000/111903891 87%
98475600/111903891 88%
98477800/111903891 88%
98479800/111903891 88%
Wed Jun 20 02:38:53 [conn2] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6f27fb0b7403c30c7021') }

ntoreturn:1 reslen:44 300001ms
98481600/111903891 88%
Wed Jun 20 02:39:02 [conn22] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6ee0c637de224e7ecb5d') }

ntoreturn:1 reslen:44 300001ms
Wed Jun 20 02:39:03 [conn23] command admin.$cmd command:

{ writebacklisten: ObjectId('4fbc4fe47fc06f85f192440a') }

ntoreturn:1 reslen:44 300001ms
98484300/111903891 88%
98486100/111903891 88%
98487200/111903891 88%
98487900/111903891 88%
98507100/111903891 88%
98510000/111903891 88%
98510500/111903891 88%
98512100/111903891 88%
98513000/111903891 88%
98513500/111903891 88%
98514300/111903891 88%
98514900/111903891 88%
98515700/111903891 88%
98516200/111903891 88%
98516700/111903891 88%
98517600/111903891 88%
98518800/111903891 88%
98519300/111903891 88%
98520300/111903891 88%
98520800/111903891 88%
98521900/111903891 88%
98523100/111903891 88%
98523700/111903891 88%
98525700/111903891 88%
98526400/111903891 88%
98527100/111903891 88%
98527600/111903891 88%
98528100/111903891 88%
Wed Jun 20 02:43:53 [conn2] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6f27fb0b7403c30c7021') }

ntoreturn:1 reslen:44 300001ms
98541800/111903891 88%
Wed Jun 20 02:44:02 [conn22] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6ee0c637de224e7ecb5d') }

ntoreturn:1 reslen:44 300001ms
Wed Jun 20 02:44:03 [conn23] command admin.$cmd command:

{ writebacklisten: ObjectId('4fbc4fe47fc06f85f192440a') }

ntoreturn:1 reslen:44 300001ms
98546300/111903891 88%
98548400/111903891 88%
98549100/111903891 88%
98550000/111903891 88%
98551300/111903891 88%
98552600/111903891 88%
98553100/111903891 88%
98553600/111903891 88%
98554200/111903891 88%
98555300/111903891 88%
98555800/111903891 88%
98556300/111903891 88%
98556800/111903891 88%
98557400/111903891 88%
98558000/111903891 88%
98559200/111903891 88%
98560200/111903891 88%
98562900/111903891 88%
98564700/111903891 88%
98566300/111903891 88%
98567400/111903891 88%
98568800/111903891 88%
98570100/111903891 88%
98570700/111903891 88%
98571300/111903891 88%
98571700/111903891 88%
98572100/111903891 88%
98572600/111903891 88%
Wed Jun 20 02:48:53 [conn2] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6f27fb0b7403c30c7021') }

ntoreturn:1 reslen:44 300000ms
Wed Jun 20 02:49:02 [conn22] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6ee0c637de224e7ecb5d') }

ntoreturn:1 reslen:44 300001ms
Wed Jun 20 02:49:03 [conn23] command admin.$cmd command:

{ writebacklisten: ObjectId('4fbc4fe47fc06f85f192440a') }

ntoreturn:1 reslen:44 300001ms
98573100/111903891 88%
98573500/111903891 88%
98573800/111903891 88%
98574300/111903891 88%
98574800/111903891 88%
98575300/111903891 88%
98575800/111903891 88%
98576400/111903891 88%
98578200/111903891 88%
98578800/111903891 88%
98579200/111903891 88%
98579500/111903891 88%
98579900/111903891 88%
98580200/111903891 88%
98581400/111903891 88%
98582700/111903891 88%
98583600/111903891 88%
98584400/111903891 88%
98586100/111903891 88%
98587000/111903891 88%
98587800/111903891 88%
98588300/111903891 88%
98588900/111903891 88%
98590200/111903891 88%
98590700/111903891 88%
98591300/111903891 88%
98593400/111903891 88%
98593900/111903891 88%
Wed Jun 20 02:53:53 [conn2] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6f27fb0b7403c30c7021') }

ntoreturn:1 reslen:44 300000ms
98594400/111903891 88%
Wed Jun 20 02:54:02 [conn22] command admin.$cmd command:

{ writebacklisten: ObjectId('4fcf6ee0c637de224e7ecb5d') }

ntoreturn:1 reslen:44 300001ms
Wed Jun 20 02:54:03 [conn23] command admin.$cmd command:

{ writebacklisten: ObjectId('4fbc4fe47fc06f85f192440a') }

ntoreturn:1 reslen:44 300001ms
98595100/111903891 88%
98596100/111903891 88%
98597100/111903891 88%

etc etc... and 4 hours later it's now up to ... 89%:

Wed Jun 20 05:39:03 [conn23] command admin.$cmd command:

{ writebacklisten: ObjectId('4fbc4fe47fc06f85f192440a') }

ntoreturn:1 reslen:44 300001ms
99588700/111903891 88%
99589900/111903891 88%
99590400/111903891 88%
99590800/111903891 88%
99591800/111903891 88%
99592700/111903891 88%
99593000/111903891 88%
99593700/111903891 88%
99594100/111903891 88%
99594400/111903891 88%
99594800/111903891 89%
99596500/111903891 89%
99597500/111903891 89%
99598900/111903891 89%
99599400/111903891 89%
99600200/111903891 89%
99600600/111903891 89%
99601600/111903891 89%
99602600/111903891 89%
99603600/111903891 89%
99605300/111903891 89%
99605800/111903891 89%
99606500/111903891 89%
99607300/111903891 89%
99609500/111903891 89%
99610100/111903891 89%
99611000/111903891 89%

Checking the /var/log/messages file, we saw that the linux oom_killer was being invoked and killing mongod.

when mongo adds to a new member of a replica set, the data seems to transfer over just fine but then it runs out of memory when it is building its indexes & sort tables etc...

We have other members of the replica set running with 8GB RAM / 1GB swap drive just fine...

I've only experienced oom_killer being invoked once or twice ever, and it's only when stuff is REALLY bad on the server, and always always something configured wrong and running out of disk space and ram.

mongo docs declare that mongo isn't supposed to do this.

we aren't booting up small boxes here... this failure is happening on a dual core machine with 8GB ram and a 1GB swap disk.

Mongo Docs

http://www.mongodb.org/display/DOCS/Production+Notes#ProductionNotes-Swap

"Swap

It is useful for the linux kernel to have swap space to use in emergencies. Because of the way MongoDB memory maps the database files none of this data will ever end up in swap; this means that on a healthy system the swap space will rarely be used on a system only running MongoDB. Having swap can keep the kernel from killing MongoDB when physical memory limits are reached."

http://www.mongodb.org/display/DOCS/The+Linux+Out+of+Memory+OOM+Killer

"You may also want to look at using something which compresses swap/memory like compcache.

MongoDB uses memory mapped files. The entire data is mapped. Over time, if there is no memory pressure, the mongod resident bytes may approach total memory, as the resident bytes includes file system cache bytes for the file pages open and touched by mongod."

/var/log/messages

and the oom info in the /var/log/messages file:

Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.177205] mongod invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.178669] mongod cpuset=/ mems_allowed=0
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.179406] Pid: 11935, comm: mongod Not tainted 3.1.10joyent-centos-6-opt #1
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.180830] Call Trace:
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.181467] [<ffffffff810a16a4>] dump_header+0x84/0x1e0
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.182273] [<ffffffff810a13aa>] ? find_lock_task_mm+0x2a/0x70
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.183125] [<ffffffff810a1c5d>] oom_kill_process+0x7d/0x270
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.183926] [<ffffffff810a2058>] out_of_memory+0x208/0x3a0
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.184715] [<ffffffff810a5eb7>] __alloc_pages_nodemask+0x787/0x7a0
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.185548] [<ffffffff810d54c0>] alloc_pages_current+0xa0/0x100
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.186367] [<ffffffff8109eabf>] __page_cache_alloc+0x7f/0x90
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.187178] [<ffffffff8109e589>] ? find_get_page+0x19/0xa0
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.187966] [<ffffffff8109f71b>] filemap_fault+0x1ab/0x4d0
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.188755] [<ffffffff810bd084>] __do_fault+0x74/0x510
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.189522] [<ffffffff810bd59f>] handle_pte_fault+0x7f/0xa70
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.190404] [<ffffffff810e184c>] ? mem_cgroup_count_vm_event+0x1c/0x50
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.191307] [<ffffffff810be0ce>] handle_mm_fault+0x13e/0x250
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.192142] [<ffffffff81025c9d>] do_page_fault+0x12d/0x3b0
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.192972] [<ffffffff81001037>] ? __switch_to+0x227/0x2f0
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.193806] [<ffffffff8157ad85>] ? __schedule+0x375/0x7c0
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.194613] [<ffffffff8157d62f>] page_fault+0x1f/0x30
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.195403] Mem-Info:
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.196022] Node 0 DMA per-cpu:
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.196720] CPU 0: hi: 0, btch: 1 usd: 0
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.197481] CPU 1: hi: 0, btch: 1 usd: 0
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.198255] Node 0 DMA32 per-cpu:
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.198962] CPU 0: hi: 186, btch: 31 usd: 82
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.199727] CPU 1: hi: 186, btch: 31 usd: 153
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.200558] Node 0 Normal per-cpu:
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.201260] CPU 0: hi: 186, btch: 31 usd: 175
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.202059] CPU 1: hi: 186, btch: 31 usd: 32
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.202891] active_anon:1665611 inactive_anon:301926 isolated_anon:0
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.202893] active_file:11 inactive_file:1308 isolated_file:31
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.202894] unevictable:0 dirty:0 writeback:0 unstable:0
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.202896] free:11493 slab_reclaimable:679 slab_unreclaimable:1676
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.202897] mapped:98 shmem:1 pagetables:37659 bounce:0
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.207035] Node 0 DMA free:15908kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15652kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.212201] lowmem_reserve[]: 0 3504 8040 8040
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.213231] Node 0 DMA32 free:23140kB min:4996kB low:6244kB high:7492kB active_anon:2913228kB inactive_anon:582760kB active_file:72kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3588340kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:4kB slab_reclaimable:364kB slab_unreclaimable:336kB kernel_stack:16kB pagetables:57444kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2864 all_unreclaimable? yes
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.218476] lowmem_reserve[]: 0 0 4536 4536
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.219331] Node 0 Normal free:6924kB min:6468kB low:8084kB high:9700kB active_anon:3749216kB inactive_anon:624944kB active_file:0kB inactive_file:5104kB unevictable:0kB isolated(anon):0kB isolated(file):124kB present:4644864kB mlocked:0kB dirty:0kB writeback:0kB mapped:560kB shmem:0kB slab_reclaimable:2352kB slab_unreclaimable:6368kB kernel_stack:800kB pagetables:93192kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:157 all_unreclaimable? no
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.224709] lowmem_reserve[]: 0 0 0 0
Jun 20 08:05:36 c4fabbdf-f958-4e1f-b159-c6ff18df2020 kernel: [286991.225523] Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15908kB



 Comments   
Comment by Eliot Horowitz (Inactive) [ 15/Dec/12 ]

This this wasn't fixed by the previous change, please let us know.

Comment by John Albietz [ 11/Jul/12 ]

Wow that sounds great. Thank you for the update.

Comment by Eliot Horowitz (Inactive) [ 11/Jul/12 ]

We committed what we hoped fixes this issue for SERVER-6414
We'll be putting into 2.0.7, and hopefully this solves it.

Comment by Daniel Pasette (Inactive) [ 02/Jul/12 ]

@ashish, could you provide a representative document from the data set which exhibits the problem? Can you provide collections stats, hardware details and index definitions as well?

Comment by Ashish Pandey [ 01/Jul/12 ]

We see the same problem as we regularly rebuild our replica nodes to free up the disk space (compact does not free up disk space), and we have been able to consistently reproduce the issue

This happens when the total size of an index goes over available memory (physical + swap) during phase 2 of index building. It appears as if phase 2 of index build likes to keep everything into memory until the commit in the end (at which point we start to see the memory usage tail off)

We have gone around this by adding more swap (which is only ever used during index build on initial sync), but this is severely limiting and frustrating on a large data set

Also, this is not data size dependent. We have replicated it on machines with less memory and a correspondingly smaller data set

Comment by John Albietz [ 28/Jun/12 ]

Our _id keys don't get long. they're all BinData.

These sound like related issues:

https://jira.mongodb.org/browse/SERVER-2731
https://jira.mongodb.org/browse/SERVER-3160

It sounds like updating the code that builds the indexes will likely eliminate this issue. Any way to go ahead with integrating the "new compact command" into the initial sync logic?

Comment by Andy Schwerin [ 25/Jun/12 ]

Likely same problem.

Comment by Andy Schwerin [ 25/Jun/12 ]

I don't think it's relevant, but how long do your _id keys get? Also, are they all BinData?

Comment by John Albietz [ 25/Jun/12 ]

Andy,

Thanks for the update... please let me know if you come up with anything.
We're still in a holding pattern with our mongo cluster... we still can't
add more members to the replica sets. And let me know if you need any more
data to help with your testing.

best,
John

Comment by Andy Schwerin [ 25/Jun/12 ]

FYI, haven't forgotten this. I've needed to do a little research on our btree implementation, to check a hypothesis.

Comment by John Albietz [ 21/Jun/12 ]

the index that is stalling is:
chartboost.identifiers:

Tue Jun 19 14:43:33 [rsSync] 100311571 objects cloned so far from
collection chartboost.identifiers
Tue Jun 19 14:44:06 [rsSync] clone chartboost.identifiers 102239359
Tue Jun 19 14:44:34 [rsSync] 103962186 objects cloned so far from
collection chartboost.identifiers
Tue Jun 19 14:45:06 [rsSync] clone chartboost.identifiers 105925887
Tue Jun 19 14:45:35 [rsSync] 107701193 objects cloned so far from
collection chartboost.identifiers
Tue Jun 19 14:45:40 [FileAllocator] allocating new datafile
/data/mongo/chartboost.22, filling with zeroes...
Tue Jun 19 14:45:40 [FileAllocator] done allocating datafile
/data/mongo/chartboost.22, size: 2047MB, took 0.021 secs
Tue Jun 19 14:46:06 [rsSync] clone chartboost.identifiers 109648511
Tue Jun 19 14:46:35 [rsSync] build index chartboost.identifiers

{ _id: 1 }

2000000/111540437 1%

here's the output from one of the other members of the same shard
replicaset:
I ran:

db.identifiers.stats()

{
"ns" : "chartboost.identifiers",
"count" : 114063128,
"size" : 15325951900,
"avgObjSize" : 134.36377003443215,
"storageSize" : 18357182416,
"numExtents" : 39,
"nindexes" : 1,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1.0099999999932838,
"flags" : 1,
"totalIndexSize" : 13946277408,
"indexSizes" : {
"id" : 13946277408
},
"ok" : 1
}

Here's an example of the type of data we have in the db. Nothing fancy or
complicated.

PRIMARY> db.identifiers.find().limit(1)

{ "date_created" : 1336641792, "_id" : BinData(2,"HgAAADM1MzE2MjM5MzIzMzYyMzkzMTYzMzc2NTM2MzQzMw=="), "t" : "nid", "d" : "1c7e6431c7e643", "date_modified" : 1336641792 }

And here are the indexes:
PRIMARY> db.identifiers.getIndexKeys()
[

{ "_id" : 1 }

]

Hope this helps...

Comment by Andy Schwerin [ 21/Jun/12 ]

SERVER-6161 was a bust. Not surprising, as if I had been right about it, pretty much nobody would be able to replicate. As is, the problem you're experiencing is somewhat rare in the field.

I see you've got around 112 million documents in the collection you're indexing. Could you tell me a little more about (1) the schema of the documents in the collection, (2) which index is stalling out (it should be just above the mongo log lines you pasted in). For example, is it an index over a field that sometimes contains arrays? A geo index? I'm hoping to construct a repro in house.

Comment by John Albietz [ 21/Jun/12 ]

Andy, thanks for the update. Do you need any more information to help track
down the issue?

Comment by Andy Schwerin [ 21/Jun/12 ]

Still tracking down. There's an inappropriate use of virtual address space in the BSONObjExternalSorter (SERVER-6161), but I'm not yet convinced that it's directly responsible for your observed behavior.

Comment by John Albietz [ 20/Jun/12 ]

1.
"db" : "chartboost",
"collections" : 4,
"objects" : 179412301,
"avgObjSize" : 164.59773393129828,
"dataSize" : 29530858184,
"storageSize" : 34567950256,
"numExtents" : 79,
"indexes" : 2,
"indexSize" : 20192929456,
"fileSize" : 64326991872,
"nsSizeMB" : 16,
"ok" : 1
2.
cat /proc/sys/vm/overcommit_memory
result:
0

on the box a few minutes after starting the replication:

  1. free -m
    total used free shared buffers cached
    Mem: 7989 1839 6150 0 35 1614
    -/+ buffers/cache: 189 7800
    Swap: 1023 5 1018

~]# iostat -xm
Linux 3.1.10joyent-centos-6-opt 06/20/2012 x86_64 (2 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
3.34 0.00 8.50 3.08 0.00 85.08

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
vda 0.02 3.94 13.21 3.13 0.54 0.03 71.10 0.03 1.62 0.13 0.22
vdb 0.01 7.37 395.39 7.09 50.47 1.99 266.93 0.49 1.22 0.31 12.29

3. reproducing it now with mongod with log level 2... it starts replication all over again after each failure so it'll take a little time...

4. mongo]# du -h
1.6G ./journal
4.0K ./_tmp
48G .

disabling the oom-killer and increasing the swap disk isn't a tenable solution at this point. Replication will take a few days to finish with the way that it is set up now... let's fix the root issue.

Comment by siddharth.singh@10gen.com [ 20/Jun/12 ]

1. I want to know how big is your database and what is the size of the index that you have.

db.stats() will give stats on dataSize and index size for a particular db. I don't need the stats from every single database. I just need a rough number that will tell me how big your total database is and whats the total index size that you have.

db.collection.stats() gives information about collections. If you could run that on some of your bigger collections it will give me an idea of the largest index size as well. Again, a ballpark number will do.

2. Output from free -m, iostat -xm 2 and 'cat /proc/sys/vm/overcommit_memory'

3. Can you please try reproducing the issue with log level 2 on the node that you are trying to resync and attach the logs to this ticket. Log level 2 will print more while extsort runs and will be helpful in seeing exactly what's going on. To run a mongo instance with higher verbosity level just pass an extra argument -vv on the command line when you start mongo.

4. Finally, please run du -h on the mongo data directory and post the details as well.

I was suggesting a bigger swap file to get you guys move from the blocker state while we continue to work on debugging this issue. You can also try setting the value
/proc/<pid>/oom_adj to OOM_DISABLE.

Comment by John Albietz [ 20/Jun/12 ]

thanks for the response:

1. ok... please tell me specifically what mongo commands you want to see the output from.
2. disk isn't saturated... the memory footprint of mongo keeps growing as replication continues. And then swap gets used more and more and then oom-killer kicks in and shuts down mongo.
3. let me know exactly what log files you want.

A bigger swap file isn't the solution here... even if it works it'll make replication way too slow. During replication maybe there is a memory leak or you just aren't freeing up memory? It seems from the logs that there are different stages of replication. It would make sense to free up all allocated memory before moving on to the next stage of replication.

Comment by siddharth.singh@10gen.com [ 20/Jun/12 ]

Hi,

Can you please post the following:
1) data size and the size of the indexes etc.
2) What does the memory footprint look while indexing is going on ? Output from free -m would help. Also is the disk saturated when it happens ?
3) Also complete logs from the server would be useful.

Finally, can you please try with a bigger swap file and report. It might be helpful in the meantime I go over the code and logs.

Thanks.

Generated at Thu Feb 08 03:10:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.