[JAVA-2337] Earlier detection when replica becomes unreachable Created: 07/Oct/16 Updated: 09/Jan/18 Resolved: 09/Jan/18 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | Cluster Management |
| Affects Version/s: | 3.0.0 |
| Fix Version/s: | 3.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Sergey Polovko | Assignee: | Jeffrey Yemin |
| Resolution: | Done | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Description |
|
There are several possible ways to lost replica server from cluster: Async mongo-java-driver differently handles this situations. I don't know behaviour of the synchronous one because I'm not using it. Maybe it has the same problems. What's happening from async driver perspective: In first case we will start receiving TCP segments with RST bit set, connection will be dropped and we will get MongoSocket(Read|Write)Exception while executing some operation. In DefaultServer class we will invalidate connection pool and change server status to UNKNOWN. This helps to detect that replica is down earlier (usually before monitoring thread detects it) and avoid to use probably broken connections in subsequent operations. In second case we will get nothing in response. So the only way to detect this kind of problems to use socket timeouts. After configured amount of time we not receive anything from socket we will get MongoSocketReadTimeoutException. But DefaultServer class will not invalidate connection pool in such case. Server state is not changed and we will continue to use that server in subsequent operations. Because current connection is not released yet (we are waiting response from unreachable host) subsequent operations will unsuccessfully try to open new connections and get MongoSocketOpenException after configured socket connection timeout time. But this is also ignored in DefaultServer class. All that we can do is to wait when monitoring thread wakes up after configured HeartbeatFrequency time, tries to read new server state, blocks, receives its socket read timeout and finally changes server state to UNKNOWN. |
| Comments |
| Comment by Githook User [ 09/Jan/18 ] |
|
Author: {'name': 'Jeff Yemin', 'username': 'jyemin', 'email': 'jeff.yemin@10gen.com'}Message: |
| Comment by Jeffrey Yemin [ 22/Aug/17 ] |
|
The spec that describes this behavior has been updated to reflect this use case. See here for the relevant change. So we can proceed with this in the next release. Happy to accept a pull request if you're so inclined. |
| Comment by Sergey Polovko [ 27/Feb/17 ] |
|
Hi Jeff, Do you have any considerations about this use case? I would be happy to help and provide PR if you think this issue needs some. Thanks, |
| Comment by Jeffrey Yemin [ 19/Dec/16 ] |
|
Hi Sergey, We are considering this use case and will get back to you when we've decided how to handle it. Regards, |