[JAVA-1164] ThreadLocal<Random> anonymous subclass prevents classloader from being garbage collected Created: 10/Apr/14  Updated: 01/Apr/16  Resolved: 23/Sep/14

Status: Closed
Project: Java Driver
Component/s: Cluster Management
Affects Version/s: 2.12.0
Fix Version/s: 2.12.4, 3.0.0

Type: Bug Priority: Major - P3
Reporter: Jochen Kemnade Assignee: Jeffrey Yemin
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible

 Description   

Quoting from Stefan's comment:

The problem is how BaseCluster.random is initialized to a ThreadLocal subclass instance. That class references the webapp's classloader. The ThreadLocal subclass in turn is referenced by each Thread instance (that's how ThreadLocals are implemented, they have a "helper-Map" in each Thread instance, so the leak is actually not a tiny Random instance but the whole webapp's classloader with a bunch of class definitions and statically referenced parts of the webapp



 Comments   
Comment by Githook User [ 30/Jan/15 ]

Author:

{u'username': u'jyemin', u'name': u'Jeff Yemin', u'email': u'jeff.yemin@10gen.com'}

Message: JAVA-1164: Replaced use of ThreadLocal<Random> subclass (used to override initialValue()) with a method
that explicitly gets the thread local value and sets it if it's null. This avoids some nasty classloader issues
that show up particularly in web applications, where this anonymous subclass is holding a reference to the
entire web app classloader, which is not released until the thread exits.
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/497aeabbe8038c6d72851a85dc28b08219d69672

Comment by Jeffrey Yemin [ 15/Oct/14 ]

Fix released in 2.12.4

Comment by Githook User [ 23/Sep/14 ]

Author:

{u'username': u'jyemin', u'name': u'Jeff Yemin', u'email': u'jeff.yemin@10gen.com'}

Message: JAVA-1164: Replaced use of ThreadLocal<Random> subclass (used to override initialValue()) with a method that explicitly gets the thread local value and sets it if it's null.
This avoids some nasty classloader issues that show up particularly in web applications, where this anonymous subclass is holding a reference to the entire web app classloader,
which is not released until the thread exits.
Branch: 2.12.x
https://github.com/mongodb/mongo-java-driver/commit/e66bda8731b40122adf30ca424d9f81828cee8de

Comment by Githook User [ 23/Sep/14 ]

Author:

{u'username': u'jyemin', u'name': u'Jeff Yemin', u'email': u'jeff.yemin@10gen.com'}

Message: JAVA-1164: Replaced use of ThreadLocal<Random> subclass (used to override initialValue()) with a method
that explicitly gets the thread local value and sets it if it's null. This avoids some nasty classloader issues
that show up particularly in web applications, where this anonymous subclass is holding a reference to the
entire web app classloader, which is not released until the thread exits.
Branch: 3.0.x
https://github.com/mongodb/mongo-java-driver/commit/497aeabbe8038c6d72851a85dc28b08219d69672

Comment by Jeffrey Yemin [ 23/Sep/14 ]

I tested the fix using Apache Tomcat 8.0.12 and a trivial servlet. With the 2.12.3 driver I see the SEVERE log message reported in an earlier comment, and it's no longer there using 2.12.4-SNAPSHOT.

Comment by Jeffrey Yemin [ 23/Sep/14 ]

The fix is also available in a 2.12.4 SNAPSHOT at: https://oss.sonatype.org/content/repositories/snapshots/org/mongodb/mongo-java-driver/2.12.4-SNAPSHOT/mongo-java-driver-2.12.4-20140923.101931-15.jar

Comment by Jeffrey Yemin [ 22/Sep/14 ]

Snapshot is available here: https://oss.sonatype.org/content/repositories/snapshots/org/mongodb/mongo-java-driver/2.13.0-SNAPSHOT/

Comment by Jeffrey Yemin [ 18/Sep/14 ]

Thanks stefanl for getting to the bottom of this. I pushed a fix to master with the suggested solution, so anyone is willing to test it, please let me know. A 2.13.0-SHAPSHOT will be available as soon as the build passes.

Comment by Githook User [ 18/Sep/14 ]

Author:

{u'username': u'jyemin', u'name': u'Jeff Yemin', u'email': u'jeff.yemin@10gen.com'}

Message: JAVA-1164: Replaced use of ThreadLocal<Random> subclass (used to override initialValue()) with a method that explicitly gets the thread local value and sets it if it's null.
This avoids some nasty classloader issues that show up particularly in web applications, where this anonymous subclass is holding a reference to the entire web app classloader,
which is not released until the thread exits.
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/eb3c7a413a3b65c814dfdac9e141fad4677fba92

Comment by Gardella Juan Pablo [ 17/Sep/14 ]

I've tested Stefan Larson suggestion and Tomcat doesn't show the SEVERE message anymore.

Comment by Jochen Kemnade [ 22/Jul/14 ]

Still an issue in 2.12.3.

Comment by Anthony Kroeker [ 16/Jul/14 ]

Just thought I'd report that I've also come across this issue.

We run and track several MongoClient instances at a time, and .close() them all when the web-app shutsdown/reloads (or earlier, if they aren't required any longer).

Comment by Jochen Kemnade [ 10/Jul/14 ]

Thanks Stefan, that's very helpful information.
I think that if you insist on using a ThreadLocal, there should be a remove() method if that's the only way to prevent a memory leak. There should not be any leaks, no matter how big or small they are.
Also, I think that Stefan's is right, we might be overdoing it a bit with the randomness. But maybe you could do something like at the bottom of http://java-performance.info/java-util-random-java-util-concurrent-threadlocalrandom-multithreaded-environments/:
Create a array of Random instances and use them one after another. If you use a "rolling" AtomicLong as the index counter and a sufficiently large array, that should do the trick. Something like

 
private static final AtomicLong counter = new AtomicLong();
private static final Random[] randomArray;
 
static {
  randomArray = new Random[1000];
  for (int i = 0; i<randomArray.length; i++){
    randomArray[i] = new Random();
  }
}
 
protected Random getRandom() {
  int index = counter.getAndIncrement() % randomArray.length;
  return randomArray[index];
}

I haven't tested or tried to compile the code but it should give you the general idea.

Comment by Stefan Larsson [ 10/Jul/14 ]

The problem is how BaseCluster.random is initialized to a ThreadLocal subclass instance. That class references the webapp's classloader. The ThreadLocal subclass in turn is referenced by each Thread instance (that's how ThreadLocals are implemented, they have a "helper-Map" in each Thread instance, so the leak is actually not a tiny Random instance but the whole webapp's classloader with a bunch of class definitions and statically referenced parts of the webapp

Starting with Java 7 you can use java.util.concurrent.ThreadLocalRandom to sidestep the issue, but then you depend on Java 7, might be too early for that.

Another option is to not override ThreadLocal at all, just replace the random declaration with

private static final ThreadLocal<Random> random = new ThreadLocal<Random>();

And then rewrite getRandom() to something like this:

protected Random getRandom() {
  Random result = random.get();
  if (result == null) {
    result = new Random();
    random.set(result);
  }
  return result;
}

Yes, there's a small race condition in the code above, but it doesn't matter if an extra temporary Random instance is created and overwritten occasionally. This latter way, you just introduce a reference from a Thread instance to a JDK Random class instance (plus a ThreadLocal instance per Thread that used it), not a class from the webapp's classloader, which has way less impact.

You could also add a cleanup method to BaseCluster which calls random.remove(), but if you'd call that after each application request after selecting servers once or a few times there's not much advantage.

By the way, is it really beneficial to pick a random server? Wouldn't round-robin with a simple counter (either plain or volatile since it's not critical it always increases by one (you wanted randomness, right?) or an AtomicInteger to be sure it's incrementing by one each time) work at least as well?

Comment by Jeffrey Yemin [ 21/May/14 ]

Can you provide a small sample app that reproduces the issue? I'd like to see exactly where the application creates and destroys the MongoClient, and add some instrumentation.

Comment by Jochen Kemnade [ 21/May/14 ]

and 2.12.2

Comment by Jochen Kemnade [ 02/May/14 ]

Also affects 2.12.1

Comment by Jochen Kemnade [ 15/Apr/14 ]

Oh, I'm sorry, I was a bit absend-minded. I do have a singleton MongoClient. The request handler threads call requestStart and requestEnsureConnection when they retrieve a DB from the MongoClient and requestDone when they are done.

Comment by Jeffrey Yemin [ 15/Apr/14 ]

One thing I would advise you to do is avoid creating a MongoClient for each request. There is quite a bit of overhead for a new MongoClient, and, as it's thread-safe, users generally treat a MongoClient as a singleton that is shared across all requests. If you go this route, you would open and close the MongoClient via a ServletContextListener.

Comment by Jochen Kemnade [ 15/Apr/14 ]

The main problem is that Tomcat uses a thread pool to handle incoming requests. A request creates and eventually closes the MongoClient. Upon completion, the thread is returned to the pool but the ThreadLocal is not removed. IMHO, closing the MongoClient should remove the ThreadLocal.
I can create a sample application but that will probably take some time.

Comment by Jeffrey Yemin [ 15/Apr/14 ]

Hi Jochen,

I think we need to try to reproduce this to figure out what's going on. Can you provide a small sample web app that demonstrates the problem? In particular, I'd like to see exactly where the MongoClient instance is created and destroyed.

Thanks,
Jeff

Comment by Jochen Kemnade [ 15/Apr/14 ]

Yes, all MongoClient instances (in fact, there is only a single one) are closed when the application is shut down.

Comment by Jeffrey Yemin [ 14/Apr/14 ]

OK, thanks for the new information. Does the web application close all MongoClient instances before it's stopped? If not, please ensure that it does and let me know if the log message is still printed.

Comment by Jochen Kemnade [ 14/Apr/14 ]

I don't think that this is fixed. I use the Java driver in a web application that is deployed in a Tomcat container. The message

SEVERE: The web application [/webapp] created a ThreadLocal with key of type [com.mongodb.BaseCluster$1] (value [com.mongodb.BaseCluster$1@106674f7]) and a value of type [java.util.Random] (value [java.util.Random@66479b9]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.

appears in the log file. I haven't yet found the time to investigate why that is the case though.

Comment by Jeffrey Yemin [ 14/Apr/14 ]

As I haven't heard back from you, I'm assuming that it's clear this is not a bug. Please re-open if you disagree.

Regards,
Jeff

Generated at Thu Feb 08 08:53:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.