[JAVA-452] allow to modify the connection timeout for the maintenance thread Created: 18/Oct/11 Updated: 18/Aug/13 Resolved: 18/Aug/13 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | Cluster Management |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Antoine Girbal | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
http://groups.google.com/group/mongodb-user/browse_thread/thread/edc5445822ca5f4c right now it's set to 20s and cannot be modified. |
| Comments |
| Comment by Jeffrey Yemin [ 18/Aug/13 ] |
|
A system property was added for this a long time ago. |
| Comment by Jeffrey Yemin [ 10/Mar/12 ] |
|
Examined the source code a bit more closely, and it looks like you can control the timeouts for the replica set maintenance thread: _mongoOptionsDefaults.connectTimeout = Integer.parseInt(System.getProperty("com.mongodb.updaterConnectTimeoutMS", "20000")); This is not the circuit breaker that you described, but it gives you some control at least. |
| Comment by Antoine Girbal [ 18/Oct/11 ] |
|
The maintenance thread runs in the background, so regular queries do not wait on it unless it's the 1st query ever to the driver. That being said I would not be surprised if some of our error handling code still try to update the master themselves, which could trigger a thread buildup as you describe. |
| Comment by Fabio Pugliese Ornellas [ 18/Oct/11 ] |
|
Hello, The root problem I see, is that if you have some MongoDB unavailability (let's say due to a network issue), your queries will go from a couple tenths of milliseconds (eg. 20ms), to a few seconds (eg. 2s timeout). Since your QPS will be the same (users accessing the site), your concurrent connections will also be multiplied by the same factor (Little's law), in this case, by 100x. This will usually break all thread limits, from load balancer, Apache, Jetty, etc, and make the whole app unavailable. If you use an aggressive timeout, you might drop legit requests, since some small % of them will be slower at regular operation. Michael Nygard at his book "Release It" describe a way to fix this: implement a circuit breaker. This is, if the external resoruce is out, you fail fast, and give an immediate error. Even during an outage, you won't be slower, in fact, you will be faster. This avoids reaching the thread limits. Implementing a configurable timeout is better than having a 10s default, but might not holt the app up during an outage. Cheers. |