Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-39909

mongod killed by oom-killer

    • Type: Icon: Question Question
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.12
    • Component/s: WiredTiger
    • Labels:
      None

      Hello,

      I am trying to gain insight into an ongoing issue we've been having. Some configuration info:

      • MongoDB 3.2.12
      • centos-release-6-8.el6.centos.12.3.x86_64 running in a 64GB VMware VM

      Two stand-alone instances of mongod run on this server: no sharding, no replication. There are several other memory consumers of memory on this server, mostly 5 or 6 Java programs only one of which consumes any significant memory (~16GB). And this one Java program is the main application that depends on MongoDB.

      What we observe is everything working OK for 2-5 days and then the oom-killer decides to kill one of the mongods. This has happened 5-6 times over the last month. Typically, only 1 mongod is killed but there was at least one occasion where both were.

      Output from 'sar' shows the same pattern again and again: consumption of RAM, sometimes fairly rapid, over the 2-5 days, followed by ~61GB of RAM in use for a day or two and then the oom-killer does its thing.

      I should mention that I tried to constrain the WT cache size to 12GB for each of the two mongods. This seemed to prevent the oom-killer from firing, but our application became 'unresponsive'.

      I should also mention that I've read a ton of MongoDB Jiras on this issue and, while I know that 3.2.12 is getting long in the tooth, many of the improvements in WT's memory management were supposedly, though not exclusively, in 3.2.10.

      As to our application's 'access patterns', it's difficult to be precise but my sense is that it combines periodic bursts of write activity along with the occasional (human-driven) reading of a very large collection. (In this regard I am familiar with the Jiras that discuss MongoDB threads turning their attention to cache eviction rather than servicing application requests - but I believe that this issue was improved in 3.2.10).

      In any event, I would be very grateful for some bright light on this ongoing and very frustrating problem. At a minimum, if I could upload the WT diagnostics and someone at MongoDB could run their internal visualizer against it - along with an analysis, that would be a good start.

      Thank you for your help.

       

       

       

            Assignee:
            daniel.hatcher@mongodb.com Danny Hatcher (Inactive)
            Reporter:
            saultocsin PMB
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: