[JAVA-236] Slow memory leak caused by non-static ThreadLocal in DBTCPConnector Created: 17/Dec/10  Updated: 03/May/11  Resolved: 17/Dec/10

Status: Closed
Project: Java Driver
Component/s: API
Affects Version/s: 2.3
Fix Version/s: 2.4

Type: Bug Priority: Major - P3
Reporter: Alex Piggott Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

uname -a: "Linux domU-12-31-39-08-13-C5 2.6.18-xenU-ec2-v1.0 #2 SMP Mon Feb 18 14:28:43 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux "



 Description   

As discussed in http://groups.google.com/group/mongodb-user/browse_thread/thread/c73351031d6f98e5/9eb7f536cc98f5d4#9eb7f536cc98f5d4:

It appears that DBTcpConnector uses a non-static ThreadLocal "MyPort" in order to provide a per-(thread, object) connection cache.

Using non-static ThreadLocals is (probably!) fine, but in this instance when the DBTcpConnector is closed, MyPort is not removed from the ThreadLocal map resulting in a leak of hash map elements, DBTcpConnectors, etc etc.

At (initially) 2 open/close connectors per second this uses up a 4GB heap in about 4 hours, with performance gradually worsening as more time is spent in GC.

An analysis of the use of non-static thread local variables is provided here: http://www.0xcafefeed.com/tag/threadlocal/. Since Java 1.5, there is a ThreadLocal::remove function that is intended to allow lifecycle management of the ThreadLocals. Care might need to be taken to remove it in all the threads contexts it has been accessed in?

I spent 5 minutes trying the remove, and it wasn't immediately clear it had worked, but I had to move onto other things, so I just removed the ThreadLocal bit (since in the code I was working on, each connection is only accessed from one thread anyway). Removing all ThreadLocal constructs (ie just instantiating 1 MyPort per DBTCPConnector) fixed the leak.

Personally I'd just write my own static ThreadLocal object map in order to ensure I had control over removing it, and didn't need to worry about the various ThreadLocal idiosyncrasies. There was an example of one I came across when researching the leak that looked quite clean. I can probably dig it out if you can't find it/are interested.

As I mentioned in the forum, I think you should get the fix in for 2.4, because a reasonably standard usage (ie unlike mine!) can result in a very slow leak, so people are going to deploy without noticing it and then wonder why their web server gets gradually slower.

Good luck, let me know if I can be any more help!

Alex



 Comments   
Comment by Antoine Girbal [ 03/May/11 ]

Modified close() method of dbtcpconnector a bit.
If the client now call close from all thread used with a Mongo Instance, it will clean up the ThreadLocal.

Comment by auto [ 03/May/11 ]

Author:

{u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

Message: JAVA-236: make it possible to clean up ThreadLocal
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/433853014b5a5b3c440431521811606edc0a6509

Comment by Eliot Horowitz (Inactive) [ 17/Dec/10 ]

Mongo sorts all of that out.
It has an internal connection pool and will re-open new sockets, etc...

Comment by Alex Piggott [ 17/Dec/10 ]

(By the way, I'd be quite surprised if your change actually fixed the leak - setting _myPort to null does nothing since there's still a reference to it in the master ThreadLocal map.

_myPort.remove();

or just the remove is necessary according to the documentation. As I say, that didn't seem to work for me but I was doing 100 different things at the time so I probably just cocked it up somehow.

However, it also looked like you needed to call remove() from each thread it had been accessed from, which would be a nightmare in the general case.

That's why I might have a synchronized non-static non-threadlocal hash map in your DBTCPConnector where I stored a MyPort for each thread if accessing that object - avoiding (slightly) abusing ThreadLocal which doesn't have a very nice lifecycle management API and using one you control...

A static threadlocal with a hashmap for the objects so you didn't need to do any synchronization control should also work, though you'd have to write some code to periodically cleanup entries containing objects closed from different threads, messy but not too painful. There'd be a slight risk that a load of objects got stuck with references in a thread sitting unused in a tomcat connection pool of course.
)

Comment by Alex Piggott [ 17/Dec/10 ]

Eliot:

"but why are you opening new Mongo instances all the time?
You should create 1 per JVM and leave it forever ideally."

Don't ask me, I just inherited this code recently!

One reason I can see for how it's done is that we might want to move the MongoDB around without restarting our user-facing servers. So currently for each request we get we look in a properties file and get the hostname/port of the server. Whether that's a sensible strategy (vs say updating DNS, or storing several Mongos in a map) is certainly debatable, but we're too close to deploying for me to want to make any significant changes at this point.

So a quick question: if the MongoDB goes down, do you get an exception at the user layer and then need to close/re-open the Mongo object yourself, or does Mongo catch the exception and try opening a new connection itself? If Mongo sorts all that out, I might just bite the bullet and move to a single Mongo instance...

Comment by Eliot Horowitz (Inactive) [ 17/Dec/10 ]

We should fix this, but why are opening new Mongo instances all the time?
You should create 1 per JVM and leave it forever ideally.

Generated at Thu Feb 08 08:51:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.