[SERVER-21928] Mongos uses too much memory Created: 17/Dec/15 Updated: 13/Jul/16 Resolved: 13/Jul/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.0.4, 3.0.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Steffen | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 4 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Description: Ubuntu 14.04.2 LTS |
||
| Operating System: | Linux |
| Participants: |
| Description |
|
On one of our hosts the mongos process uses a lot memory until the OOM kills it. Our Cluster uses mongodb version 3.0.4. We have 25 shards.
At this point after 6 days running it uses 22GB of memory with 5 active connections. We also tested version 3.0.7 of mongos on this host with same result. We think this is a bug in the mongos, that it doesn't clean up memory. |
| Comments |
| Comment by Kelsey Schubert [ 13/Jul/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi steffen, Thanks for the clarification. From our investigation, we haven't seen enough information here to indicate a bug in either MongoDB server or in the PHP driver. There are a number of possible explanations for the memory consumption you are observing which would originate from the application layer (eg. too many open cursors). For MongoDB-related support discussion please post on the mongodb-users group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group. Please see also our Technical Support page for additional support resources. Kind regards, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Steffen [ 28/Apr/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi, our router didn't restart since Tue Feb 9 09:11:50 2016. This happened only on mongos where we run our php scripts against. OOM may not be triggered any more since memory footprint is lower or we don't leak unlimited. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 25/Apr/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi rschwarzberg and steffen, We still need the information Ramon requested to diagnose the problem. If this is still an issue for you, can you please upload the logs and clarify whether all of your mongos are affected? Thank you, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 28/Mar/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
rschwarzberg, can you please upload logs for an affected mongos from startup until it gets killed? Also, are all your mongos affected or only some of them? steffen, are you still having some mongos processes killed by the OOM? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Robert Schwarzberg [ 10/Feb/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Added MongoClient::killCursor() after every time a batch was processed (currently 1000 items per query) and then created a new one for the next batch. It looks like the problem still exists, though. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Steffen [ 08/Feb/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We will reevaluate this. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 06/Feb/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Have you tried Jeremy's suggestion above of using MongoClient::killCursor()? Since your other mongos are not being impacted this all points out to too many cursors being accumulated by the execution of these scripts. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Steffen [ 03/Feb/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello, we upgraded to mongodb 3.0.9 and we still have the memory issues. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Steffen [ 22/Dec/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We are running another test and we don't hit exceptions so far. Still same result on mongo router.
Again Exceptions in mongos log:
Just heard that we are using a fork of this https://github.com/researchgate/mongodb-odm doctrine. Script is finished and we hit no exception. Memory usage on router still high
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeremy Mikola [ 21/Dec/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Do you have any insight into whether this catch block is being reached, and what the exceptions might be?
If the remove operation were to throw, that would abandon the original cursor you were iterating and it's possible that there were still batches waiting to be fetched on the server side (i.e. the cursor would be left open). This may be a case where you'd be justified in using MongoClient::killCursor() to clean up the old cursor manually before you start a new query to pick up where you left off. Since you're using a foreach, it wouldn't be feasible to simply restart iteration on the same $cursor, since that would re-execute a new query. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Steffen [ 21/Dec/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We tested with another router on a different host. Skript was still started and running on same host as before. I restarted this router before we started with the test so memory is clean.
We start the skript and memory usage begins to rise
In the router log I saw these interesting messages:
After we stopped the script memory usage stays at high level. Cron started the script another time and the router was killed
Here is the code which the scripts run:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Steffen [ 21/Dec/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Regarding our PHP Mongo Connections: we just use simple
Also we follow this suggestion http://stackoverflow.com/questions/17839962/persistent-connection-or-connection-pooling-in-php54-nginx-phpfpm-mongodb/17840527#17840527 When our PHP script is finished, connections will be closed. Most of our scripts run for a short amount of time. Some for hours. Test result (another router) will follow today. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Steffen [ 18/Dec/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
FYI: We will do some test on Monday. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeremy Mikola [ 17/Dec/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If possible, can you share how the driver is being used from the PHP script? With only five open connections on mongos, I'd assume the script isn't abandoning open connections (as it might with a client-side socket timeout). Is this only an issue when mongos is running on the same machine as the PHP cron jobs? Would it be possible to test if the same PHP cron jobs connecting to a remote mongos trigger the memory increases (obviously, this would come at the cost of an extra network hop)? |