[SERVER-6785] Possible mongos memory leak Created: 16/Aug/12 Updated: 11/Jul/16 Resolved: 21/Aug/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.2.0-rc1 |
| Fix Version/s: | 2.2.0-rc2, 2.3.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jonathan Schneider | Assignee: | Ben Becker |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
CentOS 6.2 |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Operating System: | Linux | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
I setup a 2 shard cluster with auth. With a continuous query load of around 1000/sec the mongos process seems to keep chewing up memory until the OS kills the process:
In this case I had upgraded the test server to 8GB of ram to make sure it would still occur and with 8GB it took right around an hour to use up all the memory and swap space before the OS killed it. |
| Comments |
| Comment by Azat Khuzhin [ 22/Aug/12 ] | ||||||||||||||||||||||||||||
|
I have about the same results. | ||||||||||||||||||||||||||||
| Comment by Jonathan Schneider [ 22/Aug/12 ] | ||||||||||||||||||||||||||||
|
I compiled the nightly with the changes and initial testing looks good, I cranked my test up to 2k queries/sec and am unable to duplicate the memory leak. Mongos remained at about 1% memory usage on an 8G test server. Thanks guys. | ||||||||||||||||||||||||||||
| Comment by Azat Khuzhin [ 21/Aug/12 ] | ||||||||||||||||||||||||||||
|
I'v compile new binaries with Eliot changes, and upgrade mongos on server. | ||||||||||||||||||||||||||||
| Comment by auto [ 21/Aug/12 ] | ||||||||||||||||||||||||||||
|
Author: {u'date': u'2012-08-21T15:28:40-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}Message: | ||||||||||||||||||||||||||||
| Comment by auto [ 21/Aug/12 ] | ||||||||||||||||||||||||||||
|
Author: {u'date': u'2012-08-21T15:28:40-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}Message: | ||||||||||||||||||||||||||||
| Comment by Greg Studer [ 21/Aug/12 ] | ||||||||||||||||||||||||||||
|
We've tracked one issue down - it's related to querying non-sharded collections via mongos. There's not a build yet (soon), but if this is the only problem you're experiencing you should see stable mongos memory usage if you shard all collections (for example, by _id) before repeatedly querying them in the tests. | ||||||||||||||||||||||||||||
| Comment by Jonathan Schneider [ 21/Aug/12 ] | ||||||||||||||||||||||||||||
|
Glad the problem has been observed, let us know how it goes, I'd love to test a new build that might correct this issue. | ||||||||||||||||||||||||||||
| Comment by Ben Becker [ 21/Aug/12 ] | ||||||||||||||||||||||||||||
|
Thanks Azat, I was able to successfully generate heap profiles based on this binary. The issue indeed appears to be related to memory ownership between the ShardCursor (DBClientCursor) and ParallelSortClusteredCursor. I've attached the heap profiler results in pdf format. | ||||||||||||||||||||||||||||
| Comment by Azat Khuzhin [ 21/Aug/12 ] | ||||||||||||||||||||||||||||
|
Hi Ben, Here it is | ||||||||||||||||||||||||||||
| Comment by Ben Becker [ 21/Aug/12 ] | ||||||||||||||||||||||||||||
|
Hi Azat, Could you also supply the mongos binary used to generate the heap profiler output? Thanks, | ||||||||||||||||||||||||||||
| Comment by Azat Khuzhin [ 21/Aug/12 ] | ||||||||||||||||||||||||||||
|
You are welcome. | ||||||||||||||||||||||||||||
| Comment by Ben Becker [ 20/Aug/12 ] | ||||||||||||||||||||||||||||
|
Hi Azat, Thank you for posting the results. I'm analyzing now. Could you post the git hash of the mongos instance you ran with? Thanks, | ||||||||||||||||||||||||||||
| Comment by Azat Khuzhin [ 20/Aug/12 ] | ||||||||||||||||||||||||||||
|
I'v attach my results | ||||||||||||||||||||||||||||
| Comment by Azat Khuzhin [ 20/Aug/12 ] | ||||||||||||||||||||||||||||
|
I'v already handle it myself | ||||||||||||||||||||||||||||
| Comment by Azat Khuzhin [ 20/Aug/12 ] | ||||||||||||||||||||||||||||
|
Ben I could try gperf tool, if you tell how. | ||||||||||||||||||||||||||||
| Comment by Jonathan Schneider [ 17/Aug/12 ] | ||||||||||||||||||||||||||||
|
Yes, I wiped the logfile before starting the test. I will see how much I can compress it... If its not done before I have to leave today I will get it over Monday. I doubt it will get under 150Meg, so if you could email me the SCP creds that would be great. The email on my jira account here will work fine. | ||||||||||||||||||||||||||||
| Comment by Ben Becker [ 17/Aug/12 ] | ||||||||||||||||||||||||||||
|
Hi Jonathan, Was just the last run 8 gigs? Could you try compressing and attaching to this ticket (limit is 150mb)? Otherwise I will send you an scp address for uploading the files. Thanks! | ||||||||||||||||||||||||||||
| Comment by Jonathan Schneider [ 17/Aug/12 ] | ||||||||||||||||||||||||||||
|
Reproduced,
mongostat output right before it was killed:
The log is 8GB, how do you want it? | ||||||||||||||||||||||||||||
| Comment by Jonathan Schneider [ 17/Aug/12 ] | ||||||||||||||||||||||||||||
|
I have restarted the service with the --vvvvv option and unleashed the load, will take a hour or so to kill it. Since the logfile is extremely verbose is there anything in particular I should be looking for, or do you want a copy of the whole thing after it gets killed? I am only here a half day today but could try the memory profiling on Monday. | ||||||||||||||||||||||||||||
| Comment by Ben Becker [ 17/Aug/12 ] | ||||||||||||||||||||||||||||
|
Hi Jonathan, If this is repeatable, would you be able to supply verbose log from mongos by running with '--vvvvv'? Also, would it be possible to try a debug build of mongos and provide results from the google perf tools heap profiler? If so, I will provide the binary and instructions for collecting heap profile data. Many Thanks! |