Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: 2.0.3
Component/s: Replication
Labels:
- replicaset
Environment:
Linux 3.0.18

Operating System:
Linux
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We newly initiated a replica set, but the to-be secondary never gets out of "RECOVERING" state, as the mongod process is killed by oom-killer in the middle of resync (seemingly last step of resync- when building secondary indexes, to be precise) and start from scratch every time.

journal is turned on, vm.overcommit_memory is set to 1, as suggested before.

Right now, testing "echo -17 > /proc/`cat /var/run/mongodb.pid`/oom_adj" (and "swapoff -a"), but every trial takes hours.

The data size is 10x larger than the physical memory, it seems unlikely that simply doubling the RAM would fix the problem, as the heuristics of oom-killer is rather unpredictable.

I'd like to know what triggers this failure, and what I should keep in mind.

What should we do to get resync done?

duplicates

SERVER-6414 use regular file io, not mmap for external sort

Closed

is related to

SERVER-6141 can't successfully replicate our shards anymore. replication isn't using memory efficiently and linux is invoking oom_killer to kill mongod. servers replicated earlier on same config (with smaller data sets) are still working fine...

Closed

Assignee:: siddharth.singh@10gen.com
Reporter:: Kenn Ejima
Participants:: Daniel Pasette, Eliot Horowitz, Kenn Ejima, Rakesh Sankar, siddharth.singh@10gen.com
Votes:: 1 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Mar 14 2012 06:39:44 PM UTC
Updated:: Aug 15 2012 02:04:17 PM UTC
Resolved:: Jul 11 2012 07:39:50 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates

PagerDuty