[JAVA-1881] Java application CPU 100% when mongod replica set member is stopped Created: 07/Jul/15 Updated: 01/Apr/16 Resolved: 10/Jul/15 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | Async, Connection Management |
| Affects Version/s: | 3.0.0 |
| Fix Version/s: | 3.1.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Cory Thomas | Assignee: | Ross Lawley |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
java version "1.7.0_79" Amazon EC2 instances running Amazon Linux. mongod 3.0.4 |
||
| Attachments: |
|
| Description |
|
The non-default configuration options used by the application:
We have a replica set with 3 members, identified by host names mongodb-oracle-a1-ci-set1, mongodb-oracle-b1-ci-set1, and mongodb-oracle-c1-ci-set1. There is a Java application running on a fourth host. If I stop any one of the mongod daemons, primary or secondary, while the application is running, CPU for the application process goes to 100% and the application is unusable until restarted. I can replicate the issue consistently. Attached is the log output when replicating the issue.
Note that no mongo queries are executed. The application is sitting idle when all this occurs. Once in a bad state, the application is unresponsive and must be restarted. |
| Comments |
| Comment by Jeffrey Yemin [ 16/Jul/15 ] |
|
Fix is now available in the 3.0.3 release. |
| Comment by Cory Thomas [ 10/Jul/15 ] |
|
Awesome! Thanks a bunch! |
| Comment by Ross Lawley [ 10/Jul/15 ] |
|
Fixed in master and on the 3.0.x branch. The 3.0.3 release is currently scheduled for next week and a snapshot with this fix will be available shortly. Alternatively, tests with netty showed that it wasn't impacted by this bug and is a possible workaround until then. |
| Comment by Githook User [ 10/Jul/15 ] |
|
Author: {u'username': u'rozza', u'name': u'Ross Lawley', u'email': u'ross.lawley@gmail.com'}Message: Check for premature end of stream in Async
|
| Comment by Githook User [ 10/Jul/15 ] |
|
Author: {u'username': u'rozza', u'name': u'Ross Lawley', u'email': u'ross.lawley@gmail.com'}Message: Check for premature end of stream in Async
|
| Comment by Ross Lawley [ 10/Jul/15 ] |
|
Thanks cthomas, thats the issue unlike the sync version there isn't a guarded check for this. |
| Comment by Cory Thomas [ 09/Jul/15 ] |
|
I think I have found a clue. I connected a debugger up to my simple app running on an EC2 instance. The problem seems to be located in the AsynchronousSocketChannelStream completion handler. According to AsynchronousSocketChannel documentation, the result value is -1 if no bytes could be read because the channel has reached end of stream. In the debugger, I am seeing the result value as -1 when the driver is trying to execute ismaster command. However, since the driver does not handle the result value, it just tries to read again. The read operation completes immediately because the channel is closed, so the CPU goes way up. |
| Comment by Cory Thomas [ 09/Jul/15 ] |
|
I can not replicate the issue with a 3 node replica set running locally (3 mongo daemons running on localhost) on my OSX machine. I can reliably replicate the issue on 4 EC2 linux instances. 3 machines running mongo daemons and one machine running the provided application. How I run it:
|
| Comment by Ross Lawley [ 09/Jul/15 ] |
|
Unfortunately, I've been unable to reproduce with your example code. I've run it in IntelliJ and as a standalone jar, both times when taking down members of the replicaset I've seen the expected connection refused error messages. I haven't seen CPU usage rise and I've also been able to stop and start the same member of the replicaset multiple times without any visible impact on CPU. Something is missing in my test case - how are you running the local app? |
| Comment by Cory Thomas [ 08/Jul/15 ] |
|
It appears I was mistaken when I said stopping the master or a secondary causes the issue. Stopping the master does not appear to cause a problem; only stopping a secondary. In fact, I can kill the master, start it again so it comes up as a secondary (driver handles that fine), and then immediately stop it again (driver goes crazy). I attached a very simple java app that I am now using to reproduce the problem. All it does is create a MongoClient object and then sleep. |
| Comment by Ross Lawley [ 08/Jul/15 ] |
|
Thanks, let me try and reproduce. |
| Comment by Cory Thomas [ 08/Jul/15 ] |
|
| Comment by Ross Lawley [ 08/Jul/15 ] |
|
Hi cthomas, I have a couple of questions: 1. Can you describe in more detail what the application is doing - when you say idling I'm not sure what that means? Is it in some event loop checking mongodb, on a web server awaiting requests. Ross |