Uploaded image for project: 'Java Driver'
  1. Java Driver
  2. JAVA-236

Slow memory leak caused by non-static ThreadLocal in DBTCPConnector

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.4
    • Affects Version/s: 2.3
    • Component/s: API
    • Labels:
      None
    • Environment:
      uname -a: "Linux domU-12-31-39-08-13-C5 2.6.18-xenU-ec2-v1.0 #2 SMP Mon Feb 18 14:28:43 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux "

      As discussed in http://groups.google.com/group/mongodb-user/browse_thread/thread/c73351031d6f98e5/9eb7f536cc98f5d4#9eb7f536cc98f5d4:

      It appears that DBTcpConnector uses a non-static ThreadLocal "MyPort" in order to provide a per-(thread, object) connection cache.

      Using non-static ThreadLocals is (probably!) fine, but in this instance when the DBTcpConnector is closed, MyPort is not removed from the ThreadLocal map resulting in a leak of hash map elements, DBTcpConnectors, etc etc.

      At (initially) 2 open/close connectors per second this uses up a 4GB heap in about 4 hours, with performance gradually worsening as more time is spent in GC.

      An analysis of the use of non-static thread local variables is provided here: http://www.0xcafefeed.com/tag/threadlocal/. Since Java 1.5, there is a ThreadLocal::remove function that is intended to allow lifecycle management of the ThreadLocals. Care might need to be taken to remove it in all the threads contexts it has been accessed in?

      I spent 5 minutes trying the remove, and it wasn't immediately clear it had worked, but I had to move onto other things, so I just removed the ThreadLocal bit (since in the code I was working on, each connection is only accessed from one thread anyway). Removing all ThreadLocal constructs (ie just instantiating 1 MyPort per DBTCPConnector) fixed the leak.

      Personally I'd just write my own static ThreadLocal object map in order to ensure I had control over removing it, and didn't need to worry about the various ThreadLocal idiosyncrasies. There was an example of one I came across when researching the leak that looked quite clean. I can probably dig it out if you can't find it/are interested.

      As I mentioned in the forum, I think you should get the fix in for 2.4, because a reasonably standard usage (ie unlike mine!) can result in a very slow leak, so people are going to deploy without noticing it and then wonder why their web server gets gradually slower.

      Good luck, let me know if I can be any more help!

      Alex

            Assignee:
            eliot Eliot Horowitz (Inactive)
            Reporter:
            apiggott@ikanow.com Alex Piggott
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: