[SERVER-35642] We're seeing increasing memory usage on secondary members. Created: 18/Jun/18 Updated: 27/Oct/23 Resolved: 22/Aug/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.6.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Ivan Filimonov | Assignee: | William Schultz (Inactive) |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
Basically, seeing the same symptoms as in https://jira.mongodb.org/browse/SERVER-26312 On the attached CloudWatch chart blue line is master, orange is secondary. The third node isn't monitored atm, but it's res memory usage ( {{ db.serverStatus().mem.resident }} ) was around 2833Gb, which is very close to another secondary (orange). Memory allocated for each node - 4Gb (t2-medium).
Hope it helps. Please, let me know if you need more information. |
| Comments |
| Comment by Tess Avitabile (Inactive) [ 22/Aug/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fixed by | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Filimonov [ 23/Jul/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Well it's not bugging me anymore, if you guys aren't going to fix it - then yeah. I'd suggest to at least mention somewhere in documentation that users need to update their databases if they were created with the old protocol version. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by William Schultz (Inactive) [ 23/Jul/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
sunrize531@gmail.com Can this ticket be closed? See | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 28/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the confirmation Ivan, and thank you for reporting this issue. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Filimonov [ 27/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Filimonov [ 26/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It's currently going down, slowly, but it's like it was all the time. It goes up, then down, then regains what it freed and gets little more.... Will update in few hours. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 26/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I think it may, but I'm not certain. Are you able to see a change in your monitoring? If you would like to upload a fresh copy of diagnostic.data I can also take a look. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Filimonov [ 26/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Done. Would it free excess memory after protocol version change or do I restart the member? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 26/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Ivan, thanks for uploading the data. It shows an accumulation of about 1 GB of excess allocated memory over the course of about 6 days:
The heap profiler indicates that that memory is allocated by the following call stack:
The log file indicates that you are using protocol version 0 (pv0), and I suspect that the behavior you are seeing could be a result of that. Can you try upgrading to pv1 to see if this addresses the problem? Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Filimonov [ 26/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
@Bruce Lucas, it's just entered the alarm state again. Seems little sooner then last time... Any updates on it? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Filimonov [ 25/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi I have uploaded the logs and diagnostic data, using the url above. The files start with Ivan | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Filimonov [ 20/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
So, just as a quick update - after a day since I restarted the node, it's clear that the issue is still being observed. I will update again when it hits alarm. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Filimonov [ 19/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Filimonov [ 19/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Will do | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 18/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks Ivan, you are correct, looks like it's unrelated to index builds. Would you be willing to restart the mongod on node-e with the heap profiler enabled? To do this you would set the heapProfilingEnabled parameter at startup; for example, on the command line:
Since this is a very slow increase in memory, in order to get a good signal we would want to run this way for several days to a week. I would suggest running this way for several days, then upload
At this point I would suggest leaving the mongod running with the heap profiler enabled in case our analysis determines we need to continue to accumulate more data. Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Filimonov [ 18/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Checked it myself, createIndex was called on fairly small (at that time) and very recently created collection and finished successfully. Also it looks like mongod process just has crashed right after that, doesn't look like we restarted it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Filimonov [ 18/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've uploaded the log file, it doesn't seem it was ever rotated. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Filimonov [ 18/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
@Bruce Lucas, I'm fairly sure all createIndex operations have now completed, as I have checked currentOp many times since, also the script that does it actually monitors currentOp and waits them to finish before starting processing the next collection. But let me fetch the logs for you. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 18/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
sunrize531@gmail.com, the data for node-e is showing a slow steady increase in memory utilization since about 06-06 08:50 (UTC), coinciding with some createIndexes commands which were executed on that node when it was briefly primary. The memory increase continued after mongod was restarted on 06-08, which makes me suspect it could be due to a very long-running createIndexes command. Can you please also upload the mongod log files for that node from 06-06 to the present, or all available log files if those are no longer available? Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Filimonov [ 18/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Should be uploaded now @Bruce Lucas | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 18/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
sunrize531@gmail.com, you can upload the files here. Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Filimonov [ 18/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
@ramon.fernandez, I've got the files and they're rather large... can you give me a link to that secure upload portal of yours? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Filimonov [ 18/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
System engineer isn't sure about the previous version on this cluster, it was either 3.0.12 or 3.2.8. Also, symptoms didn't appear right after mongodb version update, it's started after deploying a new app version. Probably also worth mentioning that it's a node application, and the new version also uses a newer then before mongo driver (updated to 3.0.6 from 2.2.27). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Filimonov [ 18/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
tcmalloc data:
For node E (Secondary with increased memory usage)
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 18/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
Thanks, |