[JAVA-644] SocketException causes cleanup of the entire DBPortPool leading to OutOfMemoryErrors in high load applications Created: 18/Sep/12 Updated: 04/Dec/13 Resolved: 04/Dec/13 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | Connection Management |
| Affects Version/s: | 2.8.0, 2.9.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Robert Gacki | Assignee: | Unassigned |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Oracle JVM, Linux 64bit |
||
| Description |
|
Hi, I have an application under high load connected to a MongoDB (2.0.7) using the Java Driver. When the DB fails to allocate memory (mmap), the connection on the client side is reset. The SocketException thrown is causing a cleanup of the entire pool of socket connections in the DBPortPool implementation. Is there a reason for that? A cleanup of the entire pool leads to OutOfMemoryErrors since connection (de)allocation is expensive. A failure of a single DB operation can affect an entire client application. Would it not be better to clean up just the connection that was reset? Thanks |
| Comments |
| Comment by Jeffrey Yemin [ 04/Dec/13 ] |
|
Thanks for the reply. Closing. Jeff |
| Comment by Robert Gacki [ 04/Dec/13 ] |
|
Hi Jeff, |
| Comment by Jeffrey Yemin [ 04/Dec/13 ] |
|
Hi Robert, Do you have any updates? If not, I'm going to close this ticket. |
| Comment by Robert Gacki [ 19/Jul/13 ] |
|
Hi Jeff, I'm not sure either. But it was reproducible at that time. I suspect the overhead to create new connections. Does the driver exchange data first, when connected? Maybe it's the buffers filling up all at once, when the HTTP requests hit the application. Anyway, my project planned to do load tests again. I will bring that up again so we can analyse it further. Best, |
| Comment by Jeffrey Yemin [ 12/Jul/13 ] |
|
Robert, Can you explain how controlling the rate of connection allocations will result in there being any less live objects on the heap in the end. It seems to me that ultimately we will end up with the same live heap, and you either have enough memory to hold it or you don't. |
| Comment by Robert Gacki [ 25/Jun/13 ] |
|
1. Xmx was set to 3g. A common solution is to allow the pool to warm up by configuring a minimum pool size. The pool will then allocate new connections at a controlled rate (in a separate thread with a configurable rate) instead of being populated by a herd of requests. The pool will become available, when the allocation is finished. In my case, I could set that minimum size to match the pool's maximum size. It's better for me to wait seconds longer for the application to become available then to have it fail by an OOM and to have the JVM restarted. |
| Comment by Jeffrey Yemin [ 25/Jun/13 ] |
|
A few questions:
If the primary goes down all the connections will be dropped by the server, so the connection pool needs to be cleared in any case. Do you have an alternative to suggest? |
| Comment by Robert Gacki [ 25/Jun/13 ] |
|
Hi, at that time, we did load tests. And in this scenario, we simulated an outtage of the Master. When that happened, the #gotError method of the DBPortPool class received either a SocketException or a EOFException (BSON) and all connections of the pool were closed. The OOM occured, when the load was still high and the pool was repopulated with new connections after the (new) Master became available, again. From my POV, the driver should not cause OOMs when there is a connectivity problem, even under high load, because fail-over is a feature of the driver. So I questioned the strategy of dumping the entire pool. Of course, if there is another way to mitigate the problem, I'd appreciate any hints / best practices. Best, |
| Comment by Jeffrey Yemin [ 25/Jun/13 ] |
|
It's not clear to me how closing a connection, even if an expensive operation, would lead to OOM. Can you elaborate? |