[JAVA-3301] JAva Driver Hangs with when replica set down Created: 27/May/19  Updated: 11/Sep/19  Resolved: 04/Jun/19

Status: Closed
Project: Java Driver
Component/s: Cluster Management, Connection Management
Affects Version/s: 3.6.4
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Muzzammil Ayyubi Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 18.04. running mongodb in docker


Issue Links:
Related
is related to JAVA-1868 MongoDB Java Client gets stuck after ... Closed

 Description   

Hi,

I am using spring data mongo (2.0.8) which is using mongo driver(3.6.4). I am using docker to setup my replica set cluster.

There is a weird thing happening when one of the nodes goes down and when I restart my app again. It hangs at the line

[2019-05-27 15:52:55,341] [main] INFO  connection:71 - Opened connection [connectionId{localValue:4, serverValue:6}] to mongodbms_2:27017

where mongodbms_2 is the master node running.
Then., I added the 'socketTimeout' and 'connectTimeout' property to 5000 and now app start and try to connect successfully but then it pass the above line and give error as below

Caused by: com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving messageCaused by: com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message at com.mongodb.connection.InternalStreamConnection.translateReadException(InternalStreamConnection.java:530) at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:421) at com.mongodb.connection.InternalStreamConnection.receiveCommandMessageResponse(InternalStreamConnection.java:290)
 
................
................
Caused by: java.net.SocketTimeoutException: Read timed outCaused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method)

 

And then it shutsdown the whole application instead of stopped looking for a failed MongoDB node or keep looking but not to hang or stop the application.

I have two questions:

1) Why it is hanging when not passing the sockerTimeout or connectTimeout property

2) If passing property then why it is just failed to start the app.



 Comments   
Comment by Muzzammil Ayyubi [ 06/Jun/19 ]

Hi,

Exactly (I mean the issue is not completely similar but very close), I checked above ticket and made the required changes told in the ticket but nothing much I got.
My question is, of course, it is something with network connectivity but why similar services working with the same configuration but not one service. And the only difference is the library version.
If it would be a network issue then none of the services would have started successfully once the node goes down.

Comment by Mstthias Müller [ 06/Jun/19 ]

Hi,

 

this seems somehow to be related to JAVA-1868, where the cluster failover does not work when a new primary is elected.

Regards

Matthias

Comment by Muzzammil Ayyubi [ 05/Jun/19 ]

Hi Ross,

Thanks for the reply. I raised the ticket in the mailing list as you mentioned.

https://groups.google.com/forum/#!topic/mongodb-user/HXyxUQKB4co

 

But I think it is more specific to the driver because we have 12 more other services (spring-boot) running and working fine with failover but only this, one service which hangs and the only difference is the mongo driver version is different for this service and the rest.
Hanging service driver version : 3.6.4

Other service driver version : 3.4.3

Also, I am not sure it is hanging only for 2 hours but after some time, it starts giving the mongoexception error.

If the issue with network connectivity with node and app then it should have happened to all other services with the same MongoDB configuration.

 

Please let me know if I am really missing something or there is a way to configure driver not to halt for so long for failed instance lookup if not connected and let the application start.

 

Comment by Ross Lawley [ 04/Jun/19 ]

Hi muzzamongo,

Thanks for the questions.

> 1) Why it is hanging when not passing the sockerTimeout or connectTimeout property

So here the hanging is occurring in the networking (socket) layer in java. So its dependent on the global system settings for socket timeouts - which on linux defaults to 7200 seconds (2 hours).

> 2) If passing property then why it is just failed to start the app.

Here the issue is a timeout receiving a message from the socket triggered by the timeout. So that works as expected.

Both errors point to a networking issue - which is the cause of the hang / timeouts and you should ensure all nodes have the correct network permissions / open ports.

I hope that helps. Just to let you know for future reference this project is for Java driver bugs or feature requests. The best place for questions regarding MongoDB usage or the Java driver specifics is the mongodb-user mailinglist or stackoverflow as you will reach a broader audience there. If your business requires an answer from MongoDB within a time frame then we do offer production support.

If you do follow up via one of the options above please post a link and I will follow the conversation there.

Ross

Generated at Thu Feb 08 08:59:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.