[JAVA-853] Client runs out of connections with nested calls to DB.requestStart Created: 11/Jun/13 Updated: 07/Aug/15 Resolved: 19/Aug/13 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | Connection Management |
| Affects Version/s: | 2.11.1 |
| Fix Version/s: | 2.11.3, 3.0.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jo Voordeckers | Assignee: | Jeffrey Yemin |
| Resolution: | Done | Votes: | 4 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Initially we thought our clients lost their connection and couldn't recover, but We run a replicaset with secondary preferred and use requestStart/requestDone. In the testing we did, our server requests return correctly, however it seems the mongodb connections are never released. More details of the problem: https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/4lWrfaOB_EI Full threaddump attached from after the facts, where the client is "stuck" waiting for a connection while it's connectionsPerHost is maxed out. |
| Comments |
| Comment by Jeffrey Yemin [ 31/Mar/15 ] | ||||||||||||||||||||||||||
|
Closing all resolved 3.0.0 issues, as 3.0.0 has been tagged and released. | ||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 03/Oct/13 ] | ||||||||||||||||||||||||||
|
Graeme, We're going to need a test case to get any further on this. Can you see if you can get one from a GORM user that's experiencing this problem? | ||||||||||||||||||||||||||
| Comment by Graeme Rocher [ 03/Oct/13 ] | ||||||||||||||||||||||||||
|
So what happens in a Grails/GORM app is we have a filter that intercepts every web request and does the DB.requestStart/requestDone for every request thread. The user never explicitly calls disconnect() that is done by framework. So if there is a bug it is at the driver or framework level (in GORM) In a debugger I can see requestStart() and requestDone() being called correctly. What i don't know is whether the user application that is seeing the problem is spawning any child threads from a request or creating any new sessions (which would have their own requestStart(), requestDone() lifecycle) although these would participate in the existing requestStart/requestDone unless they were in a different thread. | ||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 02/Oct/13 ] | ||||||||||||||||||||||||||
|
Hi Graeme, The behavior did change in 2.11.3. Previously if you had nested calls to DB.requestStart/requestDone, the driver would end the request after the first call to requestDone, e.g.
As of 2.11.3, clients must make sure all requestStart calls are matched by a corresponding requestDone:
The only place in GORM where I see calls to DB.requestStart/requestDone are in MongoSession. Can you confirm that this problem is reproducible even for users that are properly calling MongoSession.disconnect() on every session? If they're not, given the behavior change in 2.11.3, I could see how this would have become an issue. | ||||||||||||||||||||||||||
| Comment by Graeme Rocher [ 02/Oct/13 ] | ||||||||||||||||||||||||||
|
This is still an issue in 2.11.3 and the root cause of http://jira.grails.org/browse/GPMONGODB-307 Only downgrading to an older version of the driver fixes the problem. Can someone please reopen and address this issue? | ||||||||||||||||||||||||||
| Comment by auto [ 12/Sep/13 ] | ||||||||||||||||||||||||||
|
Author: {u'username': u'jyemin', u'name': u'Jeff Yemin', u'email': u'jeff.yemin@10gen.com'}Message: | ||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 19/Aug/13 ] | ||||||||||||||||||||||||||
|
The driver now keeps a counter of nested calls in the thread local values, which is incremented in DB.requestStart and decremented in DB.requestDone. The thread local is only removed (and the connection release back to the pool) when the count returns to 0. If there are too many calls to DB.requestDone, the extra ones are ignored, which is what the behavior was pre-2.11. The difference in behavior from 2.10: In 2.10 and below, if you had multiple calls to requestStart in a thread, the first call to requestDone took you out of the request. Now, you remain in a request until there are as many calls to requestDone as there were to requestStart. | ||||||||||||||||||||||||||
| Comment by auto [ 19/Aug/13 ] | ||||||||||||||||||||||||||
|
Author: {u'username': u'jyemin', u'name': u'Jeff Yemin', u'email': u'jeff.yemin@10gen.com'}Message: | ||||||||||||||||||||||||||
| Comment by auto [ 19/Aug/13 ] | ||||||||||||||||||||||||||
|
Author: {u'username': u'jyemin', u'name': u'Jeff Yemin', u'email': u'jeff.yemin@10gen.com'}Message: | ||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 11/Jun/13 ] | ||||||||||||||||||||||||||
|
I'm going to leave this open, as we may choose to fix it in the future, but glad that you have a workaround. | ||||||||||||||||||||||||||
| Comment by Jo Voordeckers [ 11/Jun/13 ] | ||||||||||||||||||||||||||
|
Awesome findings, I came to the same conclusion in the mean time We do need these semantics in some areas, but I have a workaround. I now avoid these nested requests to the driver, I changed our code to use a counter to track how many start/done's we called before actually calling into the Mongo Driver. This could easily be implemented at the driver level as well, but it's clean enough for me the way I've done it. Regarding the error message we've encountered, it's correct, but it's pointing in the direction of the timeout or network settings. It would be nice if the driver could be more specific about why it failed to get a connection, like hitting a connection limit in our case. Thanks for your help! | ||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 11/Jun/13 ] | ||||||||||||||||||||||||||
|
This will do it too:
I looked at the code, and it just doesn't support nested calls like this, so you're going to have to remove them for now. Are you sure you need "request" semantics at all in your application? What's the use case for it? | ||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 11/Jun/13 ] | ||||||||||||||||||||||||||
|
I have a test program that hangs:
Does this reflect what your application is doing? If I remove the calls to requestEnsureConnection, it no longer hangs. Are you sure you need those calls? That method is generally only used by clients that want to make sure that the "request" is bound to a connection on the primary server. |