[SERVER-18425] High Memory usage on Primary and secondary Created: 12/May/15  Updated: 16/Jul/17  Resolved: 14/May/15

Status: Closed
Project: Core Server
Component/s: Performance
Affects Version/s: 3.0.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andy Walker Assignee: Sam Kleinman (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

We have upgraded to version 3.0.2 and experiencing very bad performance

We are running replica set and the primary and one secondary is running at 90% Memory which is causing slow performance with our web application

We have even created a new large box (replica secondary) within AWS and it is running at 90% memory

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                            
25036 root      20   0  131g  13g  11g S  2.7 90.7 108:21.05 mongod  

When we restart the mongo process on the primary and trigger a failover to another secondary, performance improves. However, after a few hours the memory starts to rise and performance decreases.

We are currently needing restart the mongod service on our preferred primary every 24 hours to maintain performance.



 Comments   
Comment by Andy Walker [ 15/May/15 ]

Thanks Sam

Comment by Sam Kleinman (Inactive) [ 14/May/15 ]

From the output of currentOp, it looks like you have a large number of long running aggregation operations that may be consuming a large amount of resources, which could affect performance as you observe.

If you don't have an appropriate index, this could explain the long running aggregation operations. Adding an index on this field might help these operations complete more quickly.

Also, it might be worth while to audit your application to make sure that you're not accidentally creating extra requests. You can also add, the maxTimeMS option to the aggregation command to ensure that operations abort on the server so that the server doesn't continue to use resources for requests that the client may not handle.

I'm going to go ahead and close this ticket: the SERVER project is for bugs and feature requests against the MongoDB server, and it doesn't look like there's an underlying server issue here. If you encounter a bug in MongoDB, feel free to open another ticket, or or MongoDB-related support discussion please post on the mongodb-users group
or Stack Overflow with the mongodb tag.

Regards,
sam

Comment by Andy Walker [ 14/May/15 ]

Hi Sam

Please find the requested information below:

We do not use noTimeout option with cursors.

ditno_replicaset:PRIMARY> db.serverStatus().metrics.cursor
{
"timedOut" : NumberLong(132),
"open" :

{ "noTimeout" : NumberLong(0), "pinned" : NumberLong(240), "total" : NumberLong(242) }

}

db.currentOp is attached

Node "mongodb": "1.4.29"
pymongo 2.8

Comment by Andy Walker [ 14/May/15 ]

db.currentOp output

Comment by Sam Kleinman (Inactive) [ 13/May/15 ]

Hello,

This sounds like it could be related to an accumulation of open and potentially unused cursors that are continuing to consume resources but are not being used. To help us narrow down the causes of this issue, could you provide the answer to the following questions:

  • Do you use the the noTimeout option with cursors? documentation
  • Can you provide the output of the db.serverStatus().metrics.cursor operation in the mongo shell connected to a mongod instance that's experiencing the high memory usage?
  • Can you provide the output of db.currentOp(); during a period of high memory usage?
  • Which client driver(s) and version(s) are your application(s) using?

It's also true that MongoDB with the MMAP storage engine, will eventually use all of the memory on a system for cache, but this is mostly an artifact of system memory reporting, and itself doesn't cause performance issues, consider this documentation.

Regards,
sam

Generated at Thu Feb 08 03:47:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.