[JAVA-2337] Earlier detection when replica becomes unreachable Created: 07/Oct/16  Updated: 09/Jan/18  Resolved: 09/Jan/18

Status: Closed
Project: Java Driver
Component/s: Cluster Management
Affects Version/s: 3.0.0
Fix Version/s: 3.7.0

Type: Bug Priority: Major - P3
Reporter: Sergey Polovko Assignee: Jeffrey Yemin
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by DRIVERS-429 Change handling of network errors or ... Closed
Related

 Description   

There are several possible ways to lost replica server from cluster:
1. mongod process is crashed / killed / exited
2. host becomes unreachable (turned off or some sort of network problems which leads to host unreachability).

Async mongo-java-driver differently handles this situations. I don't know behaviour of the synchronous one because I'm not using it. Maybe it has the same problems.

What's happening from async driver perspective:

In first case we will start receiving TCP segments with RST bit set, connection will be dropped and we will get MongoSocket(Read|Write)Exception while executing some operation. In DefaultServer class we will invalidate connection pool and change server status to UNKNOWN. This helps to detect that replica is down earlier (usually before monitoring thread detects it) and avoid to use probably broken connections in subsequent operations.

In second case we will get nothing in response. So the only way to detect this kind of problems to use socket timeouts. After configured amount of time we not receive anything from socket we will get MongoSocketReadTimeoutException. But DefaultServer class will not invalidate connection pool in such case. Server state is not changed and we will continue to use that server in subsequent operations. Because current connection is not released yet (we are waiting response from unreachable host) subsequent operations will unsuccessfully try to open new connections and get MongoSocketOpenException after configured socket connection timeout time. But this is also ignored in DefaultServer class.

All that we can do is to wait when monitoring thread wakes up after configured HeartbeatFrequency time, tries to read new server state, blocks, receives its socket read timeout and finally changes server state to UNKNOWN.
This is quite a lot of time.



 Comments   
Comment by Githook User [ 09/Jan/18 ]

Author:

{'name': 'Jeff Yemin', 'username': 'jyemin', 'email': 'jeff.yemin@10gen.com'}

Message: JAVA-2337: invalidate the server if opening a connection to it throws any MongoSocketException
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/fdae4206ee349b6e41599c613c5382046bda0a95

Comment by Jeffrey Yemin [ 22/Aug/17 ]

The spec that describes this behavior has been updated to reflect this use case. See here for the relevant change.

So we can proceed with this in the next release. Happy to accept a pull request if you're so inclined.

Comment by Sergey Polovko [ 27/Feb/17 ]

Hi Jeff,

Do you have any considerations about this use case? I would be happy to help and provide PR if you think this issue needs some.

Thanks,
Sergey

Comment by Jeffrey Yemin [ 19/Dec/16 ]

Hi Sergey,

We are considering this use case and will get back to you when we've decided how to handle it.

Regards,
Jeff

Generated at Thu Feb 08 08:56:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.