[SERVER-5356] mongos OOM Created: 22/Mar/12  Updated: 15/Aug/12  Resolved: 10/Jul/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.0.2
Fix Version/s: None

Type: Bug Priority: Blocker - P1
Reporter: guojiangyong Assignee: Randolph Tan
Resolution: Incomplete Votes: 0
Labels: mongos
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

mongodb(cnf)+mongodb(db)+mongos


Attachments: JPEG File mornitor.JPG    
Operating System: Linux
Participants:

 Description   

mongodb(cnf)+mongodb(db)+mongos
mongos 2.0.2 always OOM.
we must be restart it everyday.Because it hold memory and don't free them.



 Comments   
Comment by Ian Whalen (Inactive) [ 03/May/12 ]

@guojiangyong, can you possibly get these machines into MMS and also add some clarification to what kind of ops you are running on these machines?

Comment by Randolph Tan [ 03/Apr/12 ]

Hi,

Would it be possible to run these machines on MMS (http://wiki.mongodb.org/display/DOCS/MongoDB+Monitoring+Service). It would also be very helpful if you describe what kind of operations you are running on these machines.

Comment by guojiangyong [ 31/Mar/12 ]

1.it is the physical memory usage.
yellow is physical'cache.
usually the cache is used by mongodb.
mongos leak memmory ,so it becameing little slowlly.
at last oom.
2. 20-30 connections when it run oom
3. I don't know.
4.no
5.no swap partition.it only have 16G physical memory.

Comment by Randolph Tan [ 29/Mar/12 ]

@guojiangyong - Some questions:

1. What is the graph plotting? Is it the physical memory usage for the machine? or mongos? or mongod? or mongos + mongod?

@patrick & @guojiangyong

2. How many connections do you have when it run out of memory?
3. What are the kinds of task do you run on it? Are you running map reduce commands when it run out of memory?
4. Are you running this on MMS?
5. How big is your swap partition?

Comment by Patrick Neff [ 29/Mar/12 ]

edit: I think this is an overcommit issue let me fix that and see if it works better.

I'm having the same problem but with a capped collection. My system has 16GB of ram and the collection is set to 8GB with a resulting 3.5GB of indexes. This should fit in memory just fine and does for about 12 hours. Then it gets OOM'd. Typical load is much higher during the day then at night and runs fine under load. The system is under relatively light load when it get OOM'd. Heck I was shrinking the size of the capped collection to 8GB to see if a smaller collection would fix it and was using greater than 100% of memory somehow and everything still ran fine.

CentOS 6 64-bit
mongodb 2.0.2
16gb RAM
Not in replica set yet.

Comment by guojiangyong [ 26/Mar/12 ]

First thank you very much!
These servers are running at product line.
centos 5.4 (64bit)
not do replica sets
server1 mongod(shard db,port:27018)+ mongodb(config,port:27019)
server2 mongod(shard db,port:27018)+ mongodb(config,port:27019)
server3 mongod(shard db,port:27018)+ mongodb(config,port:27019) + mongos(prot:27017)
server3 mongod(shard db,port:27018)+ mongos(prot:27017)

strace -c -f -p monogs PID

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
73.96 2.574478 529 4863 0 recvfrom
16.02 0.557728 307 1814 0 select
9.71 0.337935 24138 14 0 nanosleep
0.09 0.002998 2998 1 0 restart_syscall
0.06 0.002256 1 1945 0 sendto
0.04 0.001302 1 1939 0 write
0.03 0.001169 1 966 0 clone
0.03 0.001027 1 973 0 close
0.02 0.000640 0 2898 0 setsockopt
0.01 0.000428 1 417 109 futex
0.01 0.000370 0 966 0 accept
0.01 0.000236 0 1932 0 getsockopt
0.00 0.000119 0 966 0 getrlimit
0.00 0.000107 0 966 0 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 3.480793 20660 109 total

Comment by Randolph Tan [ 22/Mar/12 ]

Hi,

I noticed that you set the Tests Written field as Complete. Would you mind attaching the test?

Can you also provide more details about your environment?:

1. Number of shards (and whether they are replica sets)
2. Number of config servers
3. OS

If you don't have the test, can you describe the type of load you have in your system? For example you run several concurrrent map reduce jobs, etc.

Generated at Thu Feb 08 03:08:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.