[JAVA-853] Client runs out of connections with nested calls to DB.requestStart Created: 11/Jun/13  Updated: 07/Aug/15  Resolved: 19/Aug/13

Status: Closed
Project: Java Driver
Component/s: Connection Management
Affects Version/s: 2.11.1
Fix Version/s: 2.11.3, 3.0.0

Type: Bug Priority: Major - P3
Reporter: Jo Voordeckers Assignee: Jeffrey Yemin
Resolution: Done Votes: 4
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File full1.txt    
Issue Links:
Duplicate
is duplicated by JAVA-915 ConnectionWaitTImeout thrown after up... Closed

 Description   

Initially we thought our clients lost their connection and couldn't recover, but
we seem to run out of client connections to our primary after a while.

We run a replicaset with secondary preferred and use requestStart/requestDone. In the testing we did, our server requests return correctly, however it seems the mongodb connections are never released.

More details of the problem:

https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/4lWrfaOB_EI

Full threaddump attached from after the facts, where the client is "stuck" waiting for a connection while it's connectionsPerHost is maxed out.



 Comments   
Comment by Jeffrey Yemin [ 31/Mar/15 ]

Closing all resolved 3.0.0 issues, as 3.0.0 has been tagged and released.

Comment by Jeffrey Yemin [ 03/Oct/13 ]

Graeme,

We're going to need a test case to get any further on this. Can you see if you can get one from a GORM user that's experiencing this problem?

Comment by Graeme Rocher [ 03/Oct/13 ]

So what happens in a Grails/GORM app is we have a filter that intercepts every web request and does the DB.requestStart/requestDone for every request thread. The user never explicitly calls disconnect() that is done by framework. So if there is a bug it is at the driver or framework level (in GORM)

In a debugger I can see requestStart() and requestDone() being called correctly. What i don't know is whether the user application that is seeing the problem is spawning any child threads from a request or creating any new sessions (which would have their own requestStart(), requestDone() lifecycle) although these would participate in the existing requestStart/requestDone unless they were in a different thread.

Comment by Jeffrey Yemin [ 02/Oct/13 ]

Hi Graeme,

The behavior did change in 2.11.3. Previously if you had nested calls to DB.requestStart/requestDone, the driver would end the request after the first call to requestDone, e.g.

db.requestStart()
try {
   // do some stuff
   db.requestStart()        // this is a no-op
   try {
        // do some more stuff
   } finally {
       db.requestDone();   // this end the request for the thread, and puts the connection back in the pool
   }
   // do some more stuff  // no longer in a request.
} finally {
  db.requestDone();   // no-op: if this is left out, the connection
}

As of 2.11.3, clients must make sure all requestStart calls are matched by a corresponding requestDone:

db.requestStart()          // starts the request, and sets the counter of nested calls to 1
try {
   // do some stuff
   db.requestStart()        // increments the counter
   try {
        // do some more stuff
   } finally {
       db.requestDone();   // decrements the counter.  thread is still in a request, connection is still reserved
   }
   // do some more stuff  // still in a request
} finally {
  db.requestDone();   // decrements the counter to 0.  the connection is returned to the pool
}

The only place in GORM where I see calls to DB.requestStart/requestDone are in MongoSession. Can you confirm that this problem is reproducible even for users that are properly calling MongoSession.disconnect() on every session? If they're not, given the behavior change in 2.11.3, I could see how this would have become an issue.

Comment by Graeme Rocher [ 02/Oct/13 ]

This is still an issue in 2.11.3 and the root cause of http://jira.grails.org/browse/GPMONGODB-307

Only downgrading to an older version of the driver fixes the problem. Can someone please reopen and address this issue?

Comment by auto [ 12/Sep/13 ]

Author:

{u'username': u'jyemin', u'name': u'Jeff Yemin', u'email': u'jeff.yemin@10gen.com'}

Message: JAVA-853: Properly handling nested calls to DB.requestStart/requestDone by keeping tracking of the nesting depth, and only removing the thread local when the depth returns to 0.
Branch: 2.11.x
https://github.com/mongodb/mongo-java-driver/commit/cf49522ec8c89a9d62e58f0836efd4c46166ac98

Comment by Jeffrey Yemin [ 19/Aug/13 ]

The driver now keeps a counter of nested calls in the thread local values, which is incremented in DB.requestStart and decremented in DB.requestDone. The thread local is only removed (and the connection release back to the pool) when the count returns to 0. If there are too many calls to DB.requestDone, the extra ones are ignored, which is what the behavior was pre-2.11.

The difference in behavior from 2.10: In 2.10 and below, if you had multiple calls to requestStart in a thread, the first call to requestDone took you out of the request. Now, you remain in a request until there are as many calls to requestDone as there were to requestStart.

Comment by auto [ 19/Aug/13 ]

Author:

{u'username': u'jyemin', u'name': u'Jeff Yemin', u'email': u'jeff.yemin@10gen.com'}

Message: JAVA-853: Properly handling nested calls to DB.requestStart/requestDone by keeping tracking of the nesting depth,
and only removing the thread local when the depth returns to 0
Branch: 3.0.x
https://github.com/mongodb/mongo-java-driver/commit/f28e8d6522415f8f98cfb92a0c354941be00e2e9

Comment by auto [ 19/Aug/13 ]

Author:

{u'username': u'jyemin', u'name': u'Jeff Yemin', u'email': u'jeff.yemin@10gen.com'}

Message: JAVA-853: Properly handling nested calls to DB.requestStart/requestDone by keeping tracking of the nesting depth, and only removing the thread local when the depth returns to 0.
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/eb06150485f13086a6a9b1ffa699177d253a245f

Comment by Jeffrey Yemin [ 11/Jun/13 ]

I'm going to leave this open, as we may choose to fix it in the future, but glad that you have a workaround.

Comment by Jo Voordeckers [ 11/Jun/13 ]

Awesome findings, I came to the same conclusion in the mean time

We do need these semantics in some areas, but I have a workaround.

I now avoid these nested requests to the driver, I changed our code to use a counter to track how many start/done's we called before actually calling into the Mongo Driver. This could easily be implemented at the driver level as well, but it's clean enough for me the way I've done it.
Additionally it now allows me to track if there is a pending mongodb start/done, using a sort of intercepter for every incoming request, this way in the future should we miss a requestDone() we'll find out pretty soon and make the necessary call to avoid a connection leak later on.

Regarding the error message we've encountered, it's correct, but it's pointing in the direction of the timeout or network settings. It would be nice if the driver could be more specific about why it failed to get a connection, like hitting a connection limit in our case.

Thanks for your help!

Comment by Jeffrey Yemin [ 11/Jun/13 ]

This will do it too:

public class JAVA853 {
    public static void main(String[] args) throws UnknownHostException, InterruptedException {
        Mongo m = new MongoClient();
        DB db = m.getDB("test");
        DBCollection collection = db.getCollection("test");
 
        for (int i = 0; i < Integer.MAX_VALUE; i++) {
            db.requestStart();
            try {
                collection.findOne();
                db.requestStart();
                try {
                    collection.findOne();
                    System.out.println(i);
                } finally {
                    db.requestDone();
                }
            }
            finally {
                db.requestDone();
            }
        }
    }
}

I looked at the code, and it just doesn't support nested calls like this, so you're going to have to remove them for now. Are you sure you need "request" semantics at all in your application? What's the use case for it?

Comment by Jeffrey Yemin [ 11/Jun/13 ]

I have a test program that hangs:

public class JAVA853 {
    public static void main(String[] args) throws UnknownHostException, InterruptedException {
        Mongo m = new MongoClient();
        DB db = m.getDB("test");
        DBCollection collection = db.getCollection("test");
 
        for (int i = 0; i < Integer.MAX_VALUE; i++) {
            db.requestStart();
            try {
                db.requestEnsureConnection();
                db.requestStart();
                try {
                    db.requestEnsureConnection();
                    collection.findOne();
                    System.out.println(i);
                } finally {
                    db.requestDone();
                }
            }
            finally {
                db.requestDone();
            }
        }
    }
}

Does this reflect what your application is doing? If I remove the calls to requestEnsureConnection, it no longer hangs. Are you sure you need those calls? That method is generally only used by clients that want to make sure that the "request" is bound to a connection on the primary server.

Generated at Thu Feb 08 08:53:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.