[JAVA-3087] Buffer leak when network compression enabled Created: 09/Nov/18 Updated: 28/Oct/23 Resolved: 13/Nov/18 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | Wire Protocol |
| Affects Version/s: | 3.7.0 |
| Fix Version/s: | 3.9.1 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Colin Bester | Assignee: | Jeffrey Yemin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu linux LTS 18.x |
||
| Attachments: |
|
| Description |
|
We have been running same s/w and configuration for couple of months with no issue and on 26 Oct we made change to RestFul server configuration to enable network compression using snappy by setting compressor=snappy when initialising mongo-scala-driver, but yesterday (8th Nov) we had system failure and clients were getting “Exception Boxed Error” on our web application (Running liftweb 3.1 on Jetty 9.2.13). Our system as seen from client end is angularjs -> liftweb application (running on jetty) -> AKKA-Http RestFul server with mongo-scala-driver -> Atlas Mongo. Only error I see in restFul log files is "ERROR io.netty.util.ResourceLeakDetector - LEAK: ByteBuf.release() was not called before it's garbage-collected”. Looking through log files the earlies I see the error is on the 26 - tied to same day we made change to enable network compression. We have production and pre-stage systems, both employing same software versions, one using M40 Mongo Atlas and the other M10 Mongo Atlas - both show this error. If I disable snappy compression on pre-staging and production server then it appears that I don't see the error. Time is still to early to tell if this is fact or not, but if I enable the compression I see error within 30 minutes. I have disabled compression for 3 hours now and have not yet seen the error. I enabled advanced logging to try and pinpoint memory leak which lead me to mongodb - I have attached log file after restarting server. Is there by chance a know issue with enabling compression? I am hoping someone can help guide me in right direction. |
| Comments |
| Comment by Colin Bester [ 26/Dec/18 ] | |||||||||||||||||||||||
|
Thanks. I think that closes this thread - thanks for all the help. | |||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 26/Dec/18 ] | |||||||||||||||||||||||
|
Yes, it's fine. Generally it's safe to upgrade to a patch release so long as a library follows semantic versioning, which the Java driver does. | |||||||||||||||||||||||
| Comment by Colin Bester [ 24/Dec/18 ] | |||||||||||||||||||||||
|
So overriding async driver to 3.9.1 with mongo-scala-driver 2.5.0 will be okay?
| |||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 24/Dec/18 ] | |||||||||||||||||||||||
|
We haven't done a 2.5.1 release yet of the Scala driver that directly depends on 3.9.1 of the async Java driver, but you can always take a direct dependency on 3.9.1 in your own build file, and that will override Scala driver's dependency. | |||||||||||||||||||||||
| Comment by Colin Bester [ 24/Dec/18 ] | |||||||||||||||||||||||
|
Thanks, then I assume it's not rolled out to Scala driver yet? | |||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 24/Dec/18 ] | |||||||||||||||||||||||
|
Hi colin.bester@advancedbiotechnologiesllc.com, The bug fix is available in the 3.9.1 release of mongodb-driver-async, available on Maven Central: http://central.maven.org/maven2/org/mongodb/mongodb-driver-async/3.9.1/ | |||||||||||||||||||||||
| Comment by Colin Bester [ 24/Dec/18 ] | |||||||||||||||||||||||
|
Any idea when bug fix (Netty/Compression memory leak) https://oss.sonatype.org/content/repositories/snapshots/org/mongodb/mongodb-driver-async/3.9.1-SNAPSHOT/ will be added to java and scala projects so that I can update my production project? | |||||||||||||||||||||||
| Comment by Colin Bester [ 20/Nov/18 ] | |||||||||||||||||||||||
|
Thanks @Ross | |||||||||||||||||||||||
| Comment by Ross Lawley [ 20/Nov/18 ] | |||||||||||||||||||||||
Thanks colin.bester@advancedbiotechnologiesllc.com I've only added upgrade notes when there has been a change that requires extra consideration for upgrading. I can see that with recent releases that is non obvious, so I'll explicitly add notes for all minor releases from now on. | |||||||||||||||||||||||
| Comment by Colin Bester [ 19/Nov/18 ] | |||||||||||||||||||||||
|
Now I am happy I am rebuilding for scala 2.4.2 to try the patched java version. I didn’t see upgrade notes for 2.5.0 so am only upgrading to 2.4.2 which is a little sad. ~C | |||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 19/Nov/18 ] | |||||||||||||||||||||||
|
Yes, it's Netty + compression | |||||||||||||||||||||||
| Comment by Colin Bester [ 19/Nov/18 ] | |||||||||||||||||||||||
|
So Netty plus compression and not just netty enabled without compression? I definitely don’t see error in log file if netty is enabled and network compression is disabled. Sent from my iPhone | |||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 19/Nov/18 ] | |||||||||||||||||||||||
|
It has nothing to do with SSL, and everything to do with how the driver uses Netty's io.netty.buffer.ByteBuf. I can reproduce this issue whether or not SSL is enabled. The only requirement is that Netty is enabled. | |||||||||||||||||||||||
| Comment by Colin Bester [ 19/Nov/18 ] | |||||||||||||||||||||||
|
Yup, adding streamType causes leak. Seeing as issue only shows with netty enabled and compression I assume issue is to do with compression with SSL? Thanks so much for all your assistance. I will try patch fix in today or tomorrow and fix. I am hoping I don't have to change my Mongo-scala driver version which is mongo-scala-driver_2.11-2.3.0 as I believe it uses java 3.7.0 | |||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 18/Nov/18 ] | |||||||||||||||||||||||
|
Just insert streamType=netty. SSL is not required to use Netty; rather, Netty is required to use SSL (with the async driver). | |||||||||||||||||||||||
| Comment by Colin Bester [ 17/Nov/18 ] | |||||||||||||||||||||||
|
I haven't loaded the snapshot jar file yet as I wanted to first be able to replicate error and then I will try. In order to enable Netty on my local setup do I just need to insert 'streamType=netty' as option or do I have to setup SSL and certificates on my local MongoDB instance? | |||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 17/Nov/18 ] | |||||||||||||||||||||||
|
That explains why Netty is not reporting leaks in your local setup: you're not using Netty there. Please enable Netty for your local setup and confirm that you see the leaks. Also, please double check that you're loading InternalStreamConnection from the expected jar file. Looking at the stack trace from your last comment:
which still corresponds to the line numbers from the 3.9.0 release: https://github.com/mongodb/mongo-java-driver/blob/r3.9.0/driver-core/src/main/com/mongodb/internal/connection/InternalStreamConnection.java#L324 Compare with what's on line 324 in the 3.9.1-SNAPSHOT release: https://github.com/mongodb/mongo-java-driver/blob/3.9.x/driver-core/src/main/com/mongodb/internal/connection/InternalStreamConnection.java#L324: there's no executable code on that line. So I suspect that either you're not using the 3.9.1-SNAPSHOT or we didn't publish it correctly after the change was committed. | |||||||||||||||||||||||
| Comment by Colin Bester [ 17/Nov/18 ] | |||||||||||||||||||||||
|
Several days on test setup with no Leak error messages. This morning I modified configuration to point to my staging (cluster0) Atlas cluster and issue was there out the bat.
So still no idea why no LEAK errors when connecting to standalone database vs Atlas. On Scala side my options to URL for local setup is:
and for Atlas setup is
Only difference I see is inclusion of streamType=netty | |||||||||||||||||||||||
| Comment by Colin Bester [ 15/Nov/18 ] | |||||||||||||||||||||||
|
I haven't yet but will do | |||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 15/Nov/18 ] | |||||||||||||||||||||||
|
I don't have an explanation for this. Have you tried io.netty.leakDetection.level=paranoid? | |||||||||||||||||||||||
| Comment by Colin Bester [ 15/Nov/18 ] | |||||||||||||||||||||||
|
Only difference between the two setups is that internal setup is not using SSL while the staging system is employing SSL (as can be seen by setting streamType in MongoDB URL. I confirm that I do not see the leak message on my test server running dedicated and standalone MongoDB instance. I don't have much control on Atlas configuration but here are snippets showing command line invocation of servers and connection URL together with other information. Both dedicated server and staging server are running same Restful application s/w version which handles MongoDB connection and read/writes. Dedicated server is running MongoDB 3.6.8 and Atlas is running MongoDB 3.6.8 on M10 instance Dedicated Rest Server with standalone mongodb instance running on same host as restful server
Database configuration in Rest server is via URL - my application config is:
Java version:
Database logs show compression in action:
Staging server using Atlas Output of ps:
Database configuration in Rest server is via URL:
Java version:
Just wanting to make sure all bases are covered in case there is more than one issue. | |||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 15/Nov/18 ] | |||||||||||||||||||||||
|
There must be something different either or in your configuration of the client or of the server. The patch is agnostic to server topology: if compression is enabled, the leak should manifest. | |||||||||||||||||||||||
| Comment by Colin Bester [ 15/Nov/18 ] | |||||||||||||||||||||||
|
My standalone does have snappy compression enabled and I verified it was being used by examining logs for "decompressing" and "compressing" messages. I am hoping to get new build compiled today (excluding fix to java mongo) and will confirm. | |||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 15/Nov/18 ] | |||||||||||||||||||||||
|
No, but I suspect that your standalone didn't have Snappy compression enabled, so compression wasn't actually happening even though you configured it in the driver. | |||||||||||||||||||||||
| Comment by Colin Bester [ 15/Nov/18 ] | |||||||||||||||||||||||
|
Does the issue you found explain why leak only occurred when connection to Atlas and not a standalone single mongodb? | |||||||||||||||||||||||
| Comment by Colin Bester [ 13/Nov/18 ] | |||||||||||||||||||||||
|
Thanks. I am in throws of finalizing testing of new features and will include testing 3.9.1 for stage 2 testing.
| |||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 13/Nov/18 ] | |||||||||||||||||||||||
|
The 3.9.1-SNAPSHOT has been published to Sonatype. See https://oss.sonatype.org/content/repositories/snapshots/org/mongodb/mongodb-driver-async/3.9.1-SNAPSHOT/ | |||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 13/Nov/18 ] | |||||||||||||||||||||||
Eventually, but for now you'll need to override the transitive dependency from the Scala driver by explicitly depending on 3.9.1-SNAPSHOT
Not sure
Yes, the fix will apply to any type of compression. | |||||||||||||||||||||||
| Comment by Colin Bester [ 13/Nov/18 ] | |||||||||||||||||||||||
|
I never tested using zlib compression but assume issue would have presented itself as well if I had used zLib - would you agree and would fix address both snappy and zlib? | |||||||||||||||||||||||
| Comment by Colin Bester [ 13/Nov/18 ] | |||||||||||||||||||||||
|
Off topic I know, but is there a reason I can't use mention feature? | |||||||||||||||||||||||
| Comment by Colin Bester [ 13/Nov/18 ] | |||||||||||||||||||||||
|
Thanks Jeff, will this change roll over to scala project as well (update dependencies)?
| |||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 13/Nov/18 ] | |||||||||||||||||||||||
|
Hi colin.bester@advancedbiotechnologiesllc.com there should be a 3.9.1-SNAPSHOT available in a few hours that contains a fix for this issue. Thanks again for the report. | |||||||||||||||||||||||
| Comment by Githook User [ 13/Nov/18 ] | |||||||||||||||||||||||
|
Author: {'name': 'Jeff Yemin', 'email': 'jeff.yemin@10gen.com', 'username': 'jyemin'}Message: Fix buffer leak during compression Ensure that buffers used for uncompressed message prior to compression
| |||||||||||||||||||||||
| Comment by Githook User [ 13/Nov/18 ] | |||||||||||||||||||||||
|
Author: {'name': 'Jeff Yemin', 'email': 'jeff.yemin@10gen.com', 'username': 'jyemin'}Message: Fix buffer leak during compression Ensure that buffers used for uncompressed message prior to compression
| |||||||||||||||||||||||
| Comment by Colin Bester [ 12/Nov/18 ] | |||||||||||||||||||||||
|
I've been running production and staging server without snappy network compression enabled for close to three days with no error. This morning I re-enabled snappy network compression on staging server and restarted application and within the hour I saw same error again
| |||||||||||||||||||||||
| Comment by Colin Bester [ 09/Nov/18 ] | |||||||||||||||||||||||
|
Thanks @Jeff Yemin , appreciated.
| |||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 09/Nov/18 ] | |||||||||||||||||||||||
|
Hi colin.bester@advancedbiotechnologiesllc.com Thank you for this report. Since the compression support is in the underlying Java driver, I've move this issue into the JAVA project. This is not a known issue so we will investigate as soon as possible. For now, please keep compression disabled until we are able to provide a fix. |