[JAVA-2238] Connection and thread leakage in DefaultServerMonitor Created: 06/Jul/16 Updated: 01/Dec/16 Resolved: 15/Jul/16 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | Connection Management |
| Affects Version/s: | 3.0.0 |
| Fix Version/s: | 3.3.0 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Alexander Bulaev | Assignee: | Jeffrey Yemin |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Description |
|
When experiencing network issues between mongos and application server, we found that there is massive connection and thread leakage. In jstack this looks like this:
And up to 6-7 thousands of these useless threads per application server. At that point our mongos was killed by "too many opened files" error. |
| Comments |
| Comment by Roy Rim [ 01/Dec/16 ] | ||||||||||||||
|
Hi Matt, This is Roy Rim, I've been working with Josh. I don't believe that the 5,280 instances of these threads are due to the bug. "cluster-ClusterId {value='581757785d1b1e2a13be626d', description='null'}-10.10.10.10:27017" is generated from the MongoClient clusterId and the specific mongod/mongos address and port. If you attempt to get unique values of just the MongoClient clusterId you will end up with 1056 instances:
1056 multiplied by the number of nodes in the replica set equals 5,280. You can see that list by running the below:
If these 1056 instances were generated by the bug identified by I have asked Josh to look into any recent changes in the code base. Ideally the code should be ensuring that MongoClient instantiations are getting closed when appropriate. For instance when deploying a bundle if the JVM itself is not getting shut down and new MongoClient instances are brought up, old instances should be closed. Regards, | ||||||||||||||
| Comment by Matt Robson [ 01/Dec/16 ] | ||||||||||||||
|
Thanks. If that's the case, than we are still looking at having a similar problem, even with 3.3.0. Looking at 5300 of these threads across 5 mongo's. The stack below does look to line up with the corrected commit.
Edit; | ||||||||||||||
| Comment by Jeffrey Yemin [ 01/Dec/16 ] | ||||||||||||||
|
Hi Matt, I looked more closely and discovered that the r3.3.0 tag was not on the correct commit. That's been fixed, and you can now see that the fix for this issue was indeed included in the 3.3.0 release, as suggested by the fix version. Thanks very much for bringing this to our attention. | ||||||||||||||
| Comment by Jeffrey Yemin [ 29/Nov/16 ] | ||||||||||||||
|
The 3.4.0 will be announced today (it's already available from Maven Central). Will you be able to upgrade to 3.4.0? | ||||||||||||||
| Comment by Matt Robson [ 29/Nov/16 ] | ||||||||||||||
|
Morning! This issue was committed into 3.3.x branch July 15th, but missed the 3.3.0 tag June 30th, this is not fixed in 3.3.0. This is only fixed in 3.4. Edit: | ||||||||||||||
| Comment by Githook User [ 15/Jul/16 ] | ||||||||||||||
|
Author: {u'username': u'jyemin', u'name': u'Jeff Yemin', u'email': u'jeff.yemin@10gen.com'}Message: | ||||||||||||||
| Comment by Alexander Bulaev [ 07/Jul/16 ] | ||||||||||||||
|
Also a hint from our load testers: set a low ulimit for mongos so it would die from too much opened connections and restart in a cycle (we use our own scripts to restart broken mongoses). This is roughly what happened in our production. | ||||||||||||||
| Comment by Alexander Bulaev [ 07/Jul/16 ] | ||||||||||||||
|
Ross, we have performed our own test at the load environment and the fix definetely works. We also plan to deploy forked and patched driver in production for the time this fix is not merged upstream. | ||||||||||||||
| Comment by Ross Lawley [ 07/Jul/16 ] | ||||||||||||||
|
Thanks for confirming alexbool Could you also provide environment details as well? I'm hoping to reproduce the issue locally to ensure there are no other fixes needed. All the best, Ross | ||||||||||||||
| Comment by Alexander Bulaev [ 07/Jul/16 ] | ||||||||||||||
|
Hi Ross, we reproduced this bug in 3.2.x branch, including 3.2.2, and numerous earlier versions | ||||||||||||||
| Comment by Ross Lawley [ 07/Jul/16 ] | ||||||||||||||
|
Hi alexbool, Thanks for the ticket and the PR - can you confirm the version you are using and seeing this issue on? Ross | ||||||||||||||
| Comment by Alexander Bulaev [ 06/Jul/16 ] | ||||||||||||||
|
I've opened a PR: https://github.com/mongodb/mongo-java-driver/pull/359 |