[JAVA-1196] Lot of Read Operation Failed after upgraded to driver 2.12.1 Created: 30/Apr/14  Updated: 25/Jun/14  Resolved: 30/Apr/14

Status: Closed
Project: Java Driver
Component/s: Cluster Management
Affects Version/s: 2.12.1
Fix Version/s: None

Type: Bug Priority: Blocker - P1
Reporter: Mathieu Carbou Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Hello,

We were impacted by the bugs resolved in version 2.12.1
(https://github.com/mongodb/mongo-java-driver/releases/tag/r2.12.1)

Thus we upgraded this morning to the new driver and after the production push we get a lot of errors coming from MongoDriver, which crashes preventing web request to be fulfilled, thus having a high rate of http status code 500 errors.

Our application is hosted on Heroku, where there were also no issue at the moment.

We are using a MongoHQ Master/Slave replication set running MongoDB 2.6.0, which run well. There were no failover that could cause the issue.

I cannot say whether Heroku platform or MongoHQ could have broke the DB connection for some reason.

On MongoHQ, the current master is candidate.6, candidate.5 being the slave. There is also another slave and an arbitrer, but they are not used in the connection string.

The connection string is:

"mongodb://<user>:<pass>@candidate.6.mongolayer.com:10151,candidate.5.mongolayer.com:10151/guestful"

In any case, here are the exceptions we get:

Caused by: com.mongodb.MongoException$Network: Read operation to server c151.candidate.6.mongolayer.com:10151 failed on database guestful 
    at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:308) 
    at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:279) 
    at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:84) 
    at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:66) 
    at com.mongodb.DBCollection.findOne(DBCollection.java:870) 
    at com.mongodb.DBCollection.findOne(DBCollection.java:844) 
    at com.mongodb.DBCollection.findOne(DBCollection.java:802) 
    at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:491) 
    at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90) 
    at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233) 
    at groovy.lang.MetaMethod$doMethodInvoke.call(Unknown Source) 
    at com.gmongo.internal.Patcher._invoke(Patcher.groovy:49) 
    at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:491) 
    at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90) 
    at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233) 
    at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:361) 
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:909) 
    at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:66) 
    at com.gmongo.internal.Patcher$__patchInternal_closure1.doCall(Patcher.groovy:38) 
    at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:491) 
    at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90) 
    at org.codehaus.groovy.runtime.metaclass.ClosureMetaMethod.invoke(ClosureMetaMethod.java:80) 
    at groovy.lang.ExpandoMetaClass.invokeMethod(ExpandoMetaClass.java:1108) 
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:909) 
    at org.codehaus.groovy.runtime.callsite.PerInstancePojoMetaClassSite.call(PerInstancePojoMetaClassSite.java:35) 
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120) 
    at com.guestful.backend.service.DB$GCollection.findOne(DB.groovy:186) 
    at com.guestful.backend.service.DB$GCollection$findOne$12.call(Unknown Source) 
    at com.guestful.backend.rest.service.RestaurantHelper.findOne(RestaurantHelper.groovy:61) 
    at com.guestful.backend.rest.service.RestaurantHelper$findOne.call(Unknown Source) 
    at com.guestful.backend.rest.RestaurantAvailabilityResource.getAvailabilities(RestaurantAvailabilityResource.groovy:125) 
    at com.guestful.backend.rest.RestaurantAvailabilityResource$getAvailabilities.callCurrent(Unknown Source) 
    at com.guestful.backend.rest.RestaurantAvailabilityResource.getAvailabilitiesForRestaurant(RestaurantAvailabilityResource.groovy:44) 
    at sun.reflect.GeneratedMethodAccessor613.invoke(Unknown Source) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:491) 
    at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) 
    at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) 
    at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) 
    at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) 
    at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) 
    at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) 
    at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) 
    at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) 
    at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) 
    	... 59 common frames omitted 
Caused by: java.io.EOFException: null 
    at org.bson.io.Bits.readFully(Bits.java:50) 
    at org.bson.io.Bits.readFully(Bits.java:35) 
    at org.bson.io.Bits.readFully(Bits.java:30) 
    at com.mongodb.Response.<init>(Response.java:42) 
    at com.mongodb.DBPort$1.execute(DBPort.java:141) 
    at com.mongodb.DBPort$1.execute(DBPort.java:135) 
    at com.mongodb.DBPort.doOperation(DBPort.java:164) 
    at com.mongodb.DBPort.call(DBPort.java:135) 
    at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:300) 
    	... 109 common frames omitted

and

com.mongodb.MongoException$Network: Operation on server c151.candidate.6.mongolayer.com:10151 failed 
    at com.mongodb.DBTCPConnector.doOperation(DBTCPConnector.java:225) 
    at com.mongodb.DBCollectionImpl.receiveWriteCommandMessage(DBCollectionImpl.java:489) 
    at com.mongodb.DBCollectionImpl.writeWithCommandProtocol(DBCollectionImpl.java:428) 
    at com.mongodb.DBCollectionImpl.updateWithCommandProtocol(DBCollectionImpl.java:415) 
    at com.mongodb.DBCollectionImpl.update(DBCollectionImpl.java:267) 
    at com.mongodb.DBCollection.update(DBCollection.java:191) 
    at com.mongodb.DBCollection.update(DBCollection.java:224) 
    at sun.reflect.GeneratedMethodAccessor491.invoke(Unknown Source) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:491) 
    at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90) 
    at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233) 
    at groovy.lang.MetaMethod$doMethodInvoke.call(Unknown Source) 
    at com.gmongo.internal.Patcher._invoke(Patcher.groovy:49) 
    at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:491) 
    at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90) 
    at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233) 
    at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:361) 
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:909) 
    at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:66) 
    at com.gmongo.internal.Patcher$__patchInternal_closure1.doCall(Patcher.groovy:38) 
    at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:491) 
    at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90) 
    at org.codehaus.groovy.runtime.metaclass.ClosureMetaMethod.invoke(ClosureMetaMethod.java:80) 
    at groovy.lang.ExpandoMetaClass.invokeMethod(ExpandoMetaClass.java:1108) 
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:909) 
    at org.codehaus.groovy.runtime.callsite.PerInstancePojoMetaClassSite.call(PerInstancePojoMetaClassSite.java:35) 
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:128) 
    at com.guestful.backend.service.DB$GCollection.update(DB.groovy:202) 
    at com.guestful.backend.service.DB$GCollection.update(DB.groovy) 
    at com.guestful.backend.service.DB$GCollection$update$15.call(Unknown Source) 
    at com.guestful.backend.event.handler.Stats.onAvailabilityCallEvent(Stats.groovy:129) 
    at sun.reflect.GeneratedMethodAccessor542.invoke(Unknown Source) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:491) 
    at sun.reflect.GeneratedMethodAccessor499.invoke(Unknown Source) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:491) 
    at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMethodSite.invoke(PojoMetaMethodSite.java:189) 
    at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:53) 
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45) 
    at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:55) 
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120) 
    at com.ovea.tajin.framework.async.guava.EventHandler.handleEvent(EventHandler.groovy:72) 
    at com.ovea.tajin.framework.async.guava.EventHandler$handleEvent.call(Unknown Source) 
    at com.ovea.tajin.framework.async.guava.AsyncEventBus$1.run(AsyncEventBus.groovy:100) 
    at java_lang_Runnable$run.call(Unknown Source) 
    at com.ovea.tajin.framework.async.guice.AsyncModule$Task.run(AsyncModule.groovy:154) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:724) 
Caused by: java.io.EOFException: null 
    at org.bson.io.Bits.readFully(Bits.java:50) 
    at org.bson.io.Bits.readFully(Bits.java:35) 
    at org.bson.io.Bits.readFully(Bits.java:30) 
    at com.mongodb.Response.<init>(Response.java:42) 
    at com.mongodb.DBCollectionImpl$3.execute(DBCollectionImpl.java:492) 
    at com.mongodb.DBCollectionImpl$3.execute(DBCollectionImpl.java:489) 
    at com.mongodb.DBTCPConnector.doOperation(DBTCPConnector.java:218) 
    	... 54 common frames omitted

There are very frequent compared to before 2.12.1 driver version, and cause requests to fail.



 Comments   
Comment by Mathieu Carbou [ 25/Jun/14 ]

PVI - in our case, this issue was caused when using Mongo versio 2.6.0: the process kept crashing with a segfault.
MongoHQ witched us to the most up to date version at the time (it was 2.6.1 rc1 as I can remember). Then the issue was fixed.
Now we are running on Mongo 2.6.1 without any issues.

Comment by Jeffrey Yemin [ 25/Jun/14 ]

mnorman please let us know if you can provide any more details, especially application logs.

Comment by Michael D. Norman [ 25/Jun/14 ]

We noticed this issue as well, with 2.12.0 and 2.12.2. We are downgrading to 2.11.3 to see if it fixes the issue. We are completely in AWS with our own setup (no 3rd party involved), and we weren't seeing this issue before the upgrade to 2.12.0.

Comment by Mathieu Carbou [ 30/Apr/14 ]

Ok thank you!
I finally reached MongoHQ support which escalated the issue and deployed another version in production with the "com.mongodb" package in TRACE mode. So if the error happens again I will have plenty of logs for you

Comment by Jeffrey Yemin [ 30/Apr/14 ]

Hi Mathieu,

OK, I will close this but don't hesitate to open a new issue if you again begin to suspect the driver.

Comment by Mathieu Carbou [ 30/Apr/14 ]

Note: I logged this issue because the problems occured just after the migration to the new driver version, but I strongly fear that the issue was much more on MongoHQ side. I contacted their support a while ago, left two mesages but had no response.

Perhaps you could just close this issue because I cannot confirm it... If we were to have again a lot of these excetions while MongoHQ is working fine, I will TRACE The mongo driver package and send you the logs.

Thank you for the fast feedback !

Comment by Mathieu Carbou [ 30/Apr/14 ]

Sadly I do not have MongoHQ detailed logs and to be able to TRACE the "com.mongodb" package I would need another production deployment. I checked deeper on the stepdown alerts on our MongoHQ console and I saw a bunch of master / slave changes:

Recent stepdown alerts
 
2014-04-30 16:53:23 UTC  -  1 hour ago
There was a stepdown c151.candidate.5.mongolayer.com:10151 -> c151.candidate.6.mongolayer.com:10151
 
2014-04-30 16:43:40 UTC  -  1 hour ago
There was a stepdown c151.candidate.5.mongolayer.com:10151 -> c151.candidate.6.mongolayer.com:10151
 
2014-04-30 16:43:00 UTC  -  1 hour ago
There was a stepdown c151.candidate.6.mongolayer.com:10151 -> c151.candidate.5.mongolayer.com:10151
 
2014-04-30 15:55:40 UTC  -  2 hours ago
There was a stepdown c151.candidate.5.mongolayer.com:10151 -> c151.candidate.6.mongolayer.com:10151
 
2014-04-30 15:36:20 UTC  -  2 hours ago
There was a stepdown c151.candidate.6.mongolayer.com:10151 -> c151.candidate.5.mongolayer.com:10151
 
2014-04-30 15:32:40 UTC  -  2 hours ago
There was a stepdown c151.candidate.5.mongolayer.com:10151 -> c151.candidate.6.mongolayer.com:10151
 
2014-04-30 15:15:23 UTC  -  3 hours ago
There was a stepdown c151.candidate.6.mongolayer.com:10151 -> c151.candidate.5.mongolayer.com:10151
 
2014-04-30 14:47:58 UTC  -  3 hours ago
There was a stepdown c151.candidate.5.mongolayer.com:10151 -> c151.candidate.6.mongolayer.com:10151
 
2014-04-30 14:38:20 UTC  -  3 hours ago
There was a stepdown c151.candidate.6.mongolayer.com:10151 -> c151.candidate.5.mongolayer.com:10151
 
2014-04-30 09:43:54 UTC  -  8 hours ago
There was a stepdown c151.candidate.5.mongolayer.com:10151 -> c151.candidate.6.mongolayer.com:10151

What is strange is that we were emailed about those stepdowns, but only just one,

Do you think those EOF issues could be caused by the master shutdown and the slave taking its place ? I see a lot of coïncidence between the exception trace times and the stepdown logs from MongoHQ.

Comment by Jeffrey Yemin [ 30/Apr/14 ]

Mathieu,

By the way, if you are unable to share the logs in this public forum we can provide a private means for that. Please let us know.

Thanks,
Jeff

Comment by Jeffrey Yemin [ 30/Apr/14 ]

Can you provide Java application logs for anything prefixed with "com.mongodb"? If you can get access to server logs for c151.candidate.6.mongolayer.com:10151, we'd like to see those too.

The EOFException with that stack trace indicates that the call to SocketInputStream.read returned -1, which indicates a failed attempt to read the reply from the socket.

Comment by Jeffrey Yemin [ 30/Apr/14 ]

Can you roll back while we gather more information?

Comment by Mathieu Carbou [ 30/Apr/14 ]

We are using this Java code to setup the connections:

MongoClientURI mongoClientURI = new MongoClientURI(Env.MONGOHQ_URL)
GMongo gMongo = new GMongo(new MongoURI(mongoClientURI))
return gMongo.getDB(mongoClientURI.database)

We do not use any special connection string parameter.

Generated at Thu Feb 08 08:54:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.