[JAVA-1206] Incorrect ping information causes continuous updating cluster information Created: 09/May/14 Updated: 23/Jun/15 Resolved: 23/Jun/15 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | None |
| Affects Version/s: | 2.12.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker - P1 |
| Reporter: | Jakub Kubrynski | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
linux |
||
| Description |
|
After updating java driver to 2.12.0 gc activity raises for a several dozen percent. In logs I can find: ] But ping gives: In fact there are thousands DBPort and MongoOptions objects created every second which kills servers in less than an hour. |
| Comments |
| Comment by Jeffrey Yemin [ 23/Jun/15 ] | ||||||||
|
Closing this, as I strongly suspect that it's been fixed in later patches in the 2.12 driver series. In particular, by | ||||||||
| Comment by MOISES J S TROVO [ 31/Jul/14 ] | ||||||||
|
We're using the 2.12.0 version and I can pinpoint this problem to this version because yesterday we rolled back our client to 2.11.0 and the problem is gone for now. Yes, I saw it in the code and it's very weird because in 30 minutes it received this event from the cluster 3.5mi times (average of 1944 events/sec for a 2 nodes cluster!) . I'm out of my desk right now but I will try it and see if it works better. | ||||||||
| Comment by Jeffrey Yemin [ 31/Jul/14 ] | ||||||||
|
Moises, Which driver version are you using? MultiServerCluster.handleReplicaSetMemberChanged does not add a new server to the cluster unless the server is not currently in the cluster. See https://github.com/mongodb/mongo-java-driver/blob/2.12.x/src/main/com/mongodb/MultiServerCluster.java#L212-219. It would be useful if those experiencing this problem try the 2.12.3 patch release, as there have been a number of important fixes to the cluster monitoring implementation since 2.12.0. Anyone willing to try? Regards, | ||||||||
| Comment by MOISES J S TROVO [ 29/Jul/14 ] | ||||||||
|
I'm with this same problem on my cluster, I ran Mission Control and there's a lot of pressure on GC coming from DBPort / ServerStateNotifier classes within MongoDB driver, about 3.5mi instances and 108GB of total memory usage for a 30min dump. I'm looking at the code and it seems to be a problem with the method com.mongodb.MultiServerCluster.handleReplicaSetMemberChanged(ServerDescription) where it adds new servers to the cluster every time there's a change on the Cluster. | ||||||||
| Comment by Jakub Kubrynski [ 10/May/14 ] | ||||||||
|
One time I saw MongoSocketException.Network objects. Unfortunately I don't have saved heap dumps containing those objects. But as I remember exception comes from ensureOpen method i have autoReconnect = false | ||||||||
| Comment by Jeffrey Yemin [ 09/May/14 ] | ||||||||
|
Can you explain what you mean when you say that "there was Network exception in heap dump"? Can you add the exception stack trace (if you have it) to a comment on this issue? | ||||||||
| Comment by Jakub Kubrynski [ 09/May/14 ] | ||||||||
|
Currently I pass only one server. | ||||||||
| Comment by Jeffrey Yemin [ 09/May/14 ] | ||||||||
|
There should be a log message generated from here: https://github.com/mongodb/mongo-java-driver/blob/2.12.0/src/main/com/mongodb/ServerStateNotifier.java#L102. Perhaps there is a typo in the name of one of the servers in the seed list that the application is passing to the Mongo constructor? | ||||||||
| Comment by Jakub Kubrynski [ 09/May/14 ] | ||||||||
|
Yep - I forgot that there was Network exception in heap dump. Maybe you can also extends logging in that part, because there is nothing in logs about that. It was thrown from DBPort.ensureOpen as I remember | ||||||||
| Comment by Jeffrey Yemin [ 09/May/14 ] | ||||||||
|
I don't think it will solve it either, but might give better diagnostics. ServerStateNotifier caches DBPort instances, and only constructs new ones when an IOException is thrown. Given that, we still have not been able to determine the root cause of the high allocation rate. | ||||||||
| Comment by Jakub Kubrynski [ 09/May/14 ] | ||||||||
|
I'll check that on Monday. But I'm not sure it solves giant allocation rate od DBPort | ||||||||
| Comment by Jeffrey Yemin [ 09/May/14 ] | ||||||||
|
Would it be possible for you to test against the latest 2.12.2-SNAPSHOT? It dedicates a thread per server for monitoring, and each thread name includes the server name. That way we can see if it's one particular server that's causing the issue. The latest SNAPSHOT is at https://oss.sonatype.org/content/repositories/snapshots/org/mongodb/mongo-java-driver/2.12.2-SNAPSHOT/mongo-java-driver-2.12.2-20140507.143852-5.jar. I opened | ||||||||
| Comment by Jakub Kubrynski [ 09/May/14 ] | ||||||||
|
My fault - I've fixed ping time only once. I've used profiler and there was pretty high allocation rate on those objects. There was only one Mongo object instance. Problem is that GC times are being longer and longer, CPU usage is 100% without any user traffic. As I said - all objects are created in ThreadPoolExecutor and DBPort object are created by ServerStateNotifier in cluster-2-thread-1 | ||||||||
| Comment by Jeffrey Yemin [ 09/May/14 ] | ||||||||
|
Thanks for spotting the bug in the logging statement. Reversing the bug gives us an actual ping time of a bit under 3 ms.
prints:
which is much closer to the expected ping time. How did you calculate the 48 minute value? Looking a bit further, this bug is only in the log message, and the calculations everywhere else that use the raw value look to be correct. So I think the incorrect ping time reported in the logs is not the root cause of the problem. What we still need to figure out is why there appear to be thousands DBPort and MongoOptions objects created every second. ServerStateNotifier is only running once every 5 seconds for each server in the replica set, so long as all members are reachable, and your logs seem to confirm that behavior. Can you tell me a bit about how you are determining the rate of object creation for DBPort and MongoOptions? Also, when you say that it's killing the server, what do you mean exactly? Are threads throwing OutOfMemoryError? Regards, | ||||||||
| Comment by Jakub Kubrynski [ 09/May/14 ] | ||||||||
|
Those logs contains only lines I've pasted repeated every 5sec. I've checked and I can execute db.isMaster() on that host with the same user that is been used by my application. We have two hosts, two passives and one arbiter. All available from the application server with pings < 0.4ms to hosts and arbiter and < 10ms to passives. Also you're incorrectly converting nanos to miliseconds in https://github.com/mongodb/mongo-java-driver/blob/2.12.0/src/main/com/mongodb/ServerDescription.java#L486, because
edit: I've converted time you've reported and it's about 48 minutes. Those logs were collected after about 2-3 minutes after starting the server so it's definitely incorrect | ||||||||
| Comment by Jeffrey Yemin [ 09/May/14 ] | ||||||||
|
The ping time that the driver reports it not the result of an actual ping, so it's confusing that the driver calls it that. It's the average of the response time of the server to the 'ismaster' command, which is what ServerStateNotifier is calling. For reference: https://github.com/mongodb/mongo-java-driver/blob/2.12.0/src/main/com/mongodb/ServerStateNotifier.java#L131-140 To assist us in getting to the root cause of this issue, can you provide an attachment containing all the log messages generated by the driver for a 30 second period? If that's not something you can do in a public forum, I can make this ticket private. Regards, | ||||||||
| Comment by Jakub Kubrynski [ 09/May/14 ] | ||||||||
|
No such servers in replica set, and as you see mongo driver reports unrealistic ping time to the server which is available and ping is below 0.4 ms | ||||||||
| Comment by Jeffrey Yemin [ 09/May/14 ] | ||||||||
|
Are there any servers in the replica set that are not reachable by the application? If so, can you try increasing the value of com.mongodb.MongoClientOptions.Builder#heartbeatConnectRetryFrequency to something like 1000 ms? | ||||||||
| Comment by Jakub Kubrynski [ 09/May/14 ] | ||||||||
|
Not only - those objects are also created in ServerStateNotifier - and I've seen that DBPort and MongoOptions is being allocated in ThreadPoolExecutor. | ||||||||
| Comment by Jeffrey Yemin [ 09/May/14 ] | ||||||||
|
MongoOptions instances are only created when you create a new Mongo instance. Can you see where in your application you are creating all those MongoOptions instances? |