Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.7.0
Affects Version/s: 3.0.0
Component/s: Cluster Management
Labels:
None

Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

There are several possible ways to lost replica server from cluster:
1. mongod process is crashed / killed / exited
2. host becomes unreachable (turned off or some sort of network problems which leads to host unreachability).

Async mongo-java-driver differently handles this situations. I don't know behaviour of the synchronous one because I'm not using it. Maybe it has the same problems.

What's happening from async driver perspective:

In first case we will start receiving TCP segments with RST bit set, connection will be dropped and we will get MongoSocket(Read|Write)Exception while executing some operation. In DefaultServer class we will invalidate connection pool and change server status to UNKNOWN. This helps to detect that replica is down earlier (usually before monitoring thread detects it) and avoid to use probably broken connections in subsequent operations.

In second case we will get nothing in response. So the only way to detect this kind of problems to use socket timeouts. After configured amount of time we not receive anything from socket we will get MongoSocketReadTimeoutException. But DefaultServer class will not invalidate connection pool in such case. Server state is not changed and we will continue to use that server in subsequent operations. Because current connection is not released yet (we are waiting response from unreachable host) subsequent operations will unsuccessfully try to open new connections and get MongoSocketOpenException after configured socket connection timeout time. But this is also ignored in DefaultServer class.

All that we can do is to wait when monitoring thread wakes up after configured HeartbeatFrequency time, tries to read new server state, blocks, receives its socket read timeout and finally changes server state to UNKNOWN.
This is quite a lot of time.

is depended on by

DRIVERS-429 Change handling of network errors or timeouts during connection handshake

Closed

Assignee:: Jeffrey Yemin
Reporter:: Sergey Polovko
Reviewers:: None
Votes:: 2 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Oct 07 2016 01:27:47 AM UTC
Updated:: Jan 09 2018 02:22:07 PM UTC
Resolved:: Jan 09 2018 02:22:07 PM UTC
Confidence Status Last Update:: 03/Jan/18 7:43 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates