Uploaded image for project: 'Java Driver'
  1. Java Driver
  2. JAVA-2337

Earlier detection when replica becomes unreachable

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.7.0
    • Affects Version/s: 3.0.0
    • Component/s: Cluster Management
    • Labels:
      None

      There are several possible ways to lost replica server from cluster:
      1. mongod process is crashed / killed / exited
      2. host becomes unreachable (turned off or some sort of network problems which leads to host unreachability).

      Async mongo-java-driver differently handles this situations. I don't know behaviour of the synchronous one because I'm not using it. Maybe it has the same problems.

      What's happening from async driver perspective:

      In first case we will start receiving TCP segments with RST bit set, connection will be dropped and we will get MongoSocket(Read|Write)Exception while executing some operation. In DefaultServer class we will invalidate connection pool and change server status to UNKNOWN. This helps to detect that replica is down earlier (usually before monitoring thread detects it) and avoid to use probably broken connections in subsequent operations.

      In second case we will get nothing in response. So the only way to detect this kind of problems to use socket timeouts. After configured amount of time we not receive anything from socket we will get MongoSocketReadTimeoutException. But DefaultServer class will not invalidate connection pool in such case. Server state is not changed and we will continue to use that server in subsequent operations. Because current connection is not released yet (we are waiting response from unreachable host) subsequent operations will unsuccessfully try to open new connections and get MongoSocketOpenException after configured socket connection timeout time. But this is also ignored in DefaultServer class.

      All that we can do is to wait when monitoring thread wakes up after configured HeartbeatFrequency time, tries to read new server state, blocks, receives its socket read timeout and finally changes server state to UNKNOWN.
      This is quite a lot of time.

            Assignee:
            jeff.yemin@mongodb.com Jeffrey Yemin
            Reporter:
            jamel Sergey Polovko
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: