Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33809

Secondary member crashed due to OOM in production

    XMLWordPrintableJSON

Details

    • Icon: Question Question
    • Resolution: Works as Designed
    • Icon: Major - P3 Major - P3
    • None
    • None
    • None
    • None

    Description

      Hi All,
      we have a mongo ReplicaSet configured with 6 members - all in the same subnet. our application configured to prefer reads from secondaries.
      yesterday one of the nodes suffered from a lack of memory and the kernel decided to perform an OOM event which killed the mongo.

      the mongo is the only thing we run in the machine (using docker). i'v checked the system logs and found this:
      Mar 11 22:40:02 ip-xxx-xx-x-xxx kernel: [10313490.393104] conn14048276 invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
      Mar 11 22:40:02 ip-xxx-xx-x-xxx kernel: [10313490.393109] conn14048276 cpuset=aa6b9fb618f6296d15a964eea9cab273f3d9476fc66e3b24d2ca4ec8e2784e73 mems_allowed=0
      Mar 11 22:40:02 ip-xxx-xx-x-xxx kernel: [10313490.393112] CPU: 0 PID: 9200 Comm: conn14048276 Not tainted 3.13.0-88-generic #135-Ubuntu
      Mar 11 22:40:02 ip-xxx-xx-x-xxx kernel: [10313490.393113] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
      ........
      Mar 11 22:40:02 ip-xxx-xx-x-xxx kernel: [10313490.393323] Out of memory: Kill process 4340 (mongod) score 912 or sacrifice child
      Mar 11 22:40:02 ip-xxx-xx-x-xxx kernel: [10313490.404537] Killed process 4340 (mongod) total-vm:7052416kB, anon-rss:3681108kB, file-rss:0kB

      i do have the diagnostic data but not sure how to analyse this (see attached). we don't have any swap memory in the machine, is it recommended to add?

      its a production machine and first time we encounter a failure so its important to us to understand the root cause and what is necessary to avoid such cases

      Thanks

      Attachments

        1. metrics.2018-03-11T03-33-58Z-00000.gz
          9.93 MB
        2. metrics.2018-03-11T14-13-58Z-00000.gz
          7.65 MB
        3. metrics.2018-03-11T22-40-51Z-00000.gz
          9.95 MB
        4. SERVER-33809.png
          SERVER-33809.png
          252 kB

        Activity

          People

            dmitry.agranat@mongodb.com Dmitry Agranat
            roiey Roie Yossef
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: