Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-41066

All cluster PRIMARY mongod was killed by oomkiller in a few seconds

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.0.4
    • Component/s: Internal Code
    • Labels:
      None
    • ALL

      We had several incidents: all heavy loaded cluster PRIMARY mongod was killed by oomkiller in a few seconds.

      On the graph, it looked like mongod process doubled memory usage from 64GB in a few seconds.

      During the investigation, we found in the logs lines like:

      2019-04-23T20:39:33.160+0000 E -        [conn1790578] Assertion: BSONObjectTooLarge: BSONObj size: 66053215 (0x3EFE45F) is invalid. Size must be between 0 and 16793600(16MB) First element: stage: "OR" src/mongo/bson/bsonobj.cpp 102
      

      and:

      2019-04-23T20:39:33.283+0000 I COMMAND  [conn1790578] warning: log line attempted (11390kB) over max size (10kB), printing beginning and end
      

      We have limited the batch size in one of the our service tasks, reducing the BSON size and the problem was solved.

      We also found in the logs a few spikes in memory consumption with similar symptoms that did not cause to oomkiller: it does not look like a memory leak, since memory consumption has returned to its normal usage.

      In terms of mongod behavior, I am concerned about the following points:

      • Assertion: BSONObjectTooLarge - it seems to me that the size check occurs after it is loaded into memory;
      • log line attempted (...) over max size - it looks like the whole line is formed, although only 10kB is printed.

            Assignee:
            eric.sedor@mongodb.com Eric Sedor
            Reporter:
            bozaro Artem
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: