[SERVER-6656] Memory Leak - SSL Enabled Build Created: 31/Jul/12  Updated: 11/Jul/16  Resolved: 24/Jan/13

Status: Closed
Project: Core Server
Component/s: Networking, Security
Affects Version/s: 2.2.0-rc0
Fix Version/s: 2.4.0-rc0

Type: Bug Priority: Major - P3
Reporter: Adam Comerford Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Tested on Ubuntu 11.04, SSL 0.9.8, mongod 2.2.0-rc0


Attachments: PNG File Screen Shot 2013-03-01 at 14.20.27.png     PNG File Screen Shot 2013-03-01 at 14.20.48.png     PNG File memleak-mongos.png     PNG File memleak-ssl-2.2.0-rc0.png    
Issue Links:
Depends
Operating System: ALL
Participants:

 Description   

./mongod --version
db version v2.2.0-rc0, pdfile version 4.5
Tue Jul 31 12:34:53 git version: 33dc8445316479bbaa062db00f179fa5c39bbddb

ldd mongod
	linux-vdso.so.1 =>  (0x00007fff9ad9d000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd01c15a000)
	libssl.so.0.9.8 => /lib/libssl.so.0.9.8 (0x00007fd01bf07000)
	libcrypto.so.0.9.8 => /lib/libcrypto.so.0.9.8 (0x00007fd01bb77000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fd01b96f000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fd01b669000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd01b3e3000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd01b1cd000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd01ae39000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fd01c37e000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd01ac34000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fd01aa1c000)

To reproduce, just fire up a mongod with SSL enabled and then to create/destroy connections:

while (true) ; do echo "GET /" | openssl s_client -connect <host>:<port> ; done

Non-Mapped virtual memory will gradually trend up. Attached screenshot from MMS shows the primary trending up while the secondaries stay flat.



 Comments   
Comment by Gregor Macadam [ 01/Mar/13 ]

Still fixed - 2.4.0-rc1
Screen Shot 2013-03-01 at 14.20.27.png
Screen Shot 2013-03-01 at 14.20.48.png

Comment by Adam Comerford [ 24/Jan/13 ]

Re-tested with the fix included and can confirm the leak is no more.

Adam

Comment by auto [ 24/Jan/13 ]

Author:

{u'date': u'2013-01-24T17:32:54Z', u'email': u'milkie@10gen.com', u'name': u'Eric Milkie'}

Message: SERVER-6656 prevent SSL memory leaks

If a BackgroundTask should attempt to use SSL (via an outbound connection),
this commit will now ensure the SSL thread-specific data is cleaned up on
thread exit.
Branch: master
https://github.com/mongodb/mongo/commit/e599e186e2482dc2a1caf86f40e80ebb7b3de5f6

Comment by auto [ 24/Jan/13 ]

Author:

{u'date': u'2013-01-24T15:45:22Z', u'email': u'milkie@10gen.com', u'name': u'Eric Milkie'}

Message: SERVER-6656 fix SSL thread storage leak (for incoming connections only)
Branch: master
https://github.com/mongodb/mongo/commit/ad47817b9397d8cfe97b9a51f1ad2e806c079d77

Comment by Andy Schwerin [ 23/Jan/13 ]

Threads that use SSL sockets need to call ERR_remove_state(0) before they go out of scope, to avoid leaking per-thread SSL state. Doing it in the message_server_port threads probably suffices for the worst of the leak.

Standardizing thread cleanup is probably a thing we should do for 2.5.

Comment by Eric Milkie [ 23/Jan/13 ]

Andy is going to try to make a smaller reproducer.

Comment by Eric Milkie [ 22/Jan/13 ]

The leak stops if I use the system allocator (glibc malloc); when I switch back to using tcmalloc, the leak returns.

Comment by Eric Milkie [ 22/Jan/13 ]

I can confirm that there is no memory leak in 2.0 or 2.2 using this method of testing; only using SSL connections seems to be leaky.

Comment by Eric Milkie [ 22/Jan/13 ]

I found that the heap checker shows that only CRYPTO_malloc and the snapshot thread allocated memory during a test run. The CRYPTO_mallocs are from the session caching that OpenSSL does. I tried disabling this and the allocations vanished from the heap report, but the leak still happened. I also tried making the snapshot array a size of 1 instead of 100, and that eliminated the snapshot thread allocations, and yet the leak still happens.

Comment by Eric Milkie [ 21/Jan/13 ]

I'm about to play with tcmalloc some more and see if I can coax it to tell me which part of the heap is growing and why.

Comment by Adam Comerford [ 21/Jan/13 ]

OK, have to admit that I thought you were referring to my first example above literally, I did not grok that firstExample was something else entirely - never heard of it before TBH.

Comment by Eric Milkie [ 21/Jan/13 ]

The same while loop as above is not the same as running firstExample in a loop, so I'm not quite sure what you really ran.

I tried running the openssl s_client tester on Windows, and the server responds with an error about a message being too long. My guess is that connecting this way isn't totally benign and might be causing parsing issues on the server.

I tried running firstExample in a loop on Windows, and there is no growth at all, even with SSL. Thus, I am inclined to blame tcmalloc for the memory growth we see on Linux.

Comment by Adam Comerford [ 21/Jan/13 ]

Correct - ran the same while loop as above, still leaks memory very quickly. Eventually the OOM Killer kicked in:

[437731.427263] Out of memory: Kill process 14540 (mongod) score 967 or sacrifice child
[437731.427553] Killed process 14540 (mongod) total-vm:10043164kB, anon-rss:1923888kB, file-rss:0kB

Adam

Comment by Eric Milkie [ 17/Jan/13 ]

How did you do the testing – did you run firstExample in a loop?

Comment by Adam Comerford [ 17/Jan/13 ]

Initial testing seems to indicate that this is SSL only - I was unable to trigger a similar leak on the non-SSL build, but 2.3.2 still exhibits the non-mapped increase.

Comment by Adam Comerford [ 17/Jan/13 ]

milkie - tests running now on 2.3.2 - will let you know

Comment by Eric Milkie [ 11/Jan/13 ]

@Adam, can you confirm that this is not specific to using SSL or the SSL build?

Comment by Eric Milkie [ 29/Nov/12 ]

After spending a few days on this, I have not found the source of the leak. I can reproduce the issue, but I can't get Google Heap Checker, nor Valgrind Memcheck or Massif, to show me where the memory leak is.

I tried running mongod without --ssl and ran a similar test (ran firstExample in a loop), and I see a similar virtual memory size growth pattern, so my suspicion is that this is not SSL-specific.

Comment by Adam Comerford [ 13/Aug/12 ]

Now that SERVER-6509 is fixed, I can confirm this is happening with mongos also. Screen shot attached for mongos memory trend, same methodology.

Generated at Thu Feb 08 03:12:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.