Details
-
Question
-
Resolution: Works as Designed
-
Major - P3
-
None
-
None
-
None
-
None
Description
Hi All,
we have a mongo ReplicaSet configured with 6 members - all in the same subnet. our application configured to prefer reads from secondaries.
yesterday one of the nodes suffered from a lack of memory and the kernel decided to perform an OOM event which killed the mongo.
the mongo is the only thing we run in the machine (using docker). i'v checked the system logs and found this:
Mar 11 22:40:02 ip-xxx-xx-x-xxx kernel: [10313490.393104] conn14048276 invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Mar 11 22:40:02 ip-xxx-xx-x-xxx kernel: [10313490.393109] conn14048276 cpuset=aa6b9fb618f6296d15a964eea9cab273f3d9476fc66e3b24d2ca4ec8e2784e73 mems_allowed=0
Mar 11 22:40:02 ip-xxx-xx-x-xxx kernel: [10313490.393112] CPU: 0 PID: 9200 Comm: conn14048276 Not tainted 3.13.0-88-generic #135-Ubuntu
Mar 11 22:40:02 ip-xxx-xx-x-xxx kernel: [10313490.393113] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
........
Mar 11 22:40:02 ip-xxx-xx-x-xxx kernel: [10313490.393323] Out of memory: Kill process 4340 (mongod) score 912 or sacrifice child
Mar 11 22:40:02 ip-xxx-xx-x-xxx kernel: [10313490.404537] Killed process 4340 (mongod) total-vm:7052416kB, anon-rss:3681108kB, file-rss:0kB
i do have the diagnostic data but not sure how to analyse this (see attached). we don't have any swap memory in the machine, is it recommended to add?
its a production machine and first time we encounter a failure so its important to us to understand the root cause and what is necessary to avoid such cases
Thanks