[SERVER-30858] OOM Killer Killing mongo Created: 27/Aug/17  Updated: 16/Nov/21  Resolved: 12/Feb/18

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: suresh narasimhan Assignee: Kelsey Schubert
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

I have a 3 replicaset mongo cluster version 3.2.10, Recently we are haveing an issue of OOM killer killing mongod process , we upgraded the version to c3.2xlarge in aws and the crash still happened. we did not have a swap space configured and have configured it after the crash.

mm shows about 300 connections to mongo , I am not clear why mongo eats up so much ram inspite of the DB size being so small for the OOM killer to kill it. I did see an expensive query that ran for 2 secs that might have did a full table scan that might have run just before the OOM killer kill

The incident happened about 9.54 AM PST on Aug 26 attaching mongo logs and diagnostic data

https://drive.google.com/file/d/0B0DcYD8YgYOJMmZTcVNLUXlTdDQ/view?usp=sharing
https://drive.google.com/file/d/0B0DcYD8YgYOJYjhMczRxOFlRNEU/view?usp=sharing

https://cloud.mongodb.com/v2/50366375f1a5dd0b002fab66#host/replicaSet/5615733ee4b009c743f75edf

region_qa_11:SECONDARY> db.runCommand( { buildInfo: 1 } )
{
        "version" : "3.2.10",
        "gitVersion" : "79d9b3ab5ce20f51c272b4411202710a082d0317",
        "modules" : [ ],
        "allocator" : "tcmalloc",
        "javascriptEngine" : "mozjs",
        "sysInfo" : "deprecated",
        "versionArray" : [
                3,
                2,
                10,
                0
        ],
        "openssl" : {
                "running" : "OpenSSL 1.0.0-fips 29 Mar 2010",
                "compiled" : "OpenSSL 1.0.1e-fips 11 Feb 2013"
        },
        "buildEnvironment" : {
                "distmod" : "amazon",
                "distarch" : "x86_64",
                "cc" : "/opt/mongodbtoolchain/bin/gcc: gcc (GCC) 4.8.2",
                "ccflags" : "-fno-omit-frame-pointer -fPIC -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -Werror -O2 -Wno-unused-local-typedefs -Wno-unused-function -Wno-deprecated-declarations -Wno-unused-but-set-variable -Wno-missing-braces -fno-builtin-memcmp",
                "cxx" : "/opt/mongodbtoolchain/bin/g++: g++ (GCC) 4.8.2",
                "cxxflags" : "-Wnon-virtual-dtor -Woverloaded-virtual -Wno-maybe-uninitialized -std=c++11",
                "linkflags" : "-fPIC -pthread -Wl,-z,now -rdynamic -fuse-ld=gold -Wl,-z,noexecstack -Wl,--warn-execstack",
                "target_arch" : "x86_64",
                "target_os" : "linux"
        },
        "bits" : 64,
        "debug" : false,
        "maxBsonObjectSize" : 16777216,
        "storageEngines" : [
                "devnull",
                "ephemeralForTest",
                "mmapv1",
                "wiredTiger"
        ],
        "ok" : 1
}

db stats
{
        "db" : "region",
        "collections" : 6,
        "objects" : 5491878,
        "avgObjSize" : 406.2433517277696,
        "dataSize" : 2231038926,
        "storageSize" : 1094828032,
        "numExtents" : 0,
        "indexes" : 43,
        "indexSize" : 875229184,
        "ok" : 1
}



 Comments   
Comment by Kelsey Schubert [ 29/Sep/17 ]

Hi snarasimhan,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide us access to the syslog?

Thank you,
Kelsey

Comment by Kelsey Schubert [ 07/Sep/17 ]

Hi snarasimhan,

I do not have access to view these files. Would you please upload the requested files to our secure upload portal?

Thank you,
Thomas

Comment by suresh narasimhan [ 28/Aug/17 ]

https://docs.google.com/a/edmunds.com/document/d/1JUaKy0AmntEhqND6jaGCUPfEgsYA2qIR66BS1GilKag/edit?usp=sharing

Comment by Kelsey Schubert [ 28/Aug/17 ]

Hi snarasimhan,

Would you please provide the syslog covering this event?

Thank you,
Thomas

Comment by suresh narasimhan [ 27/Aug/17 ]

I was using the default wiredTiger cacheSize , have restricted the cacheSize to 3G (slightly more than compressed data size)

Generated at Thu Feb 08 04:25:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.