[SERVER-6656] Memory Leak - SSL Enabled Build Created: 31/Jul/12 Updated: 11/Jul/16 Resolved: 24/Jan/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Networking, Security |
| Affects Version/s: | 2.2.0-rc0 |
| Fix Version/s: | 2.4.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Adam Comerford | Assignee: | Eric Milkie |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Tested on Ubuntu 11.04, SSL 0.9.8, mongod 2.2.0-rc0 |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Operating System: | ALL | ||||
| Participants: | |||||
| Description |
To reproduce, just fire up a mongod with SSL enabled and then to create/destroy connections:
Non-Mapped virtual memory will gradually trend up. Attached screenshot from MMS shows the primary trending up while the secondaries stay flat. |
| Comments |
| Comment by Gregor Macadam [ 01/Mar/13 ] | ||
|
Still fixed - 2.4.0-rc1 | ||
| Comment by Adam Comerford [ 24/Jan/13 ] | ||
|
Re-tested with the fix included and can confirm the leak is no more. Adam | ||
| Comment by auto [ 24/Jan/13 ] | ||
|
Author: {u'date': u'2013-01-24T17:32:54Z', u'email': u'milkie@10gen.com', u'name': u'Eric Milkie'}Message: If a BackgroundTask should attempt to use SSL (via an outbound connection), | ||
| Comment by auto [ 24/Jan/13 ] | ||
|
Author: {u'date': u'2013-01-24T15:45:22Z', u'email': u'milkie@10gen.com', u'name': u'Eric Milkie'}Message: | ||
| Comment by Andy Schwerin [ 23/Jan/13 ] | ||
|
Threads that use SSL sockets need to call ERR_remove_state(0) before they go out of scope, to avoid leaking per-thread SSL state. Doing it in the message_server_port threads probably suffices for the worst of the leak. Standardizing thread cleanup is probably a thing we should do for 2.5. | ||
| Comment by Eric Milkie [ 23/Jan/13 ] | ||
|
Andy is going to try to make a smaller reproducer. | ||
| Comment by Eric Milkie [ 22/Jan/13 ] | ||
|
The leak stops if I use the system allocator (glibc malloc); when I switch back to using tcmalloc, the leak returns. | ||
| Comment by Eric Milkie [ 22/Jan/13 ] | ||
|
I can confirm that there is no memory leak in 2.0 or 2.2 using this method of testing; only using SSL connections seems to be leaky. | ||
| Comment by Eric Milkie [ 22/Jan/13 ] | ||
|
I found that the heap checker shows that only CRYPTO_malloc and the snapshot thread allocated memory during a test run. The CRYPTO_mallocs are from the session caching that OpenSSL does. I tried disabling this and the allocations vanished from the heap report, but the leak still happened. I also tried making the snapshot array a size of 1 instead of 100, and that eliminated the snapshot thread allocations, and yet the leak still happens. | ||
| Comment by Eric Milkie [ 21/Jan/13 ] | ||
|
I'm about to play with tcmalloc some more and see if I can coax it to tell me which part of the heap is growing and why. | ||
| Comment by Adam Comerford [ 21/Jan/13 ] | ||
|
OK, have to admit that I thought you were referring to my first example above literally, I did not grok that firstExample was something else entirely - never heard of it before TBH. | ||
| Comment by Eric Milkie [ 21/Jan/13 ] | ||
|
The same while loop as above is not the same as running firstExample in a loop, so I'm not quite sure what you really ran. I tried running the openssl s_client tester on Windows, and the server responds with an error about a message being too long. My guess is that connecting this way isn't totally benign and might be causing parsing issues on the server. I tried running firstExample in a loop on Windows, and there is no growth at all, even with SSL. Thus, I am inclined to blame tcmalloc for the memory growth we see on Linux. | ||
| Comment by Adam Comerford [ 21/Jan/13 ] | ||
|
Correct - ran the same while loop as above, still leaks memory very quickly. Eventually the OOM Killer kicked in:
Adam | ||
| Comment by Eric Milkie [ 17/Jan/13 ] | ||
|
How did you do the testing – did you run firstExample in a loop? | ||
| Comment by Adam Comerford [ 17/Jan/13 ] | ||
|
Initial testing seems to indicate that this is SSL only - I was unable to trigger a similar leak on the non-SSL build, but 2.3.2 still exhibits the non-mapped increase. | ||
| Comment by Adam Comerford [ 17/Jan/13 ] | ||
|
milkie - tests running now on 2.3.2 - will let you know | ||
| Comment by Eric Milkie [ 11/Jan/13 ] | ||
|
@Adam, can you confirm that this is not specific to using SSL or the SSL build? | ||
| Comment by Eric Milkie [ 29/Nov/12 ] | ||
|
After spending a few days on this, I have not found the source of the leak. I can reproduce the issue, but I can't get Google Heap Checker, nor Valgrind Memcheck or Massif, to show me where the memory leak is. I tried running mongod without --ssl and ran a similar test (ran firstExample in a loop), and I see a similar virtual memory size growth pattern, so my suspicion is that this is not SSL-specific. | ||
| Comment by Adam Comerford [ 13/Aug/12 ] | ||
|
Now that |