[JAVA-1113] Connections to failed primary replica left in CLOSE_WAIT state Created: 15/Feb/14  Updated: 23/Jun/15  Resolved: 23/Jun/15

Status: Closed
Project: Java Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Chris LeCompte Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to JAVA-710 Support max connection idle time and ... Closed

 Description   

To recreate:

1. Create a 3 node replica set (2 data nodes and 1 arbiter).
2. Start up an application using the mongodb driver with connections to the replica set.
3. Kill the primary
4. Allow the driver to detect that the primary is down
4. Start the primary again

Run netstat -a | grep CLOSE_WAIT. You should see that there are a couple connections left in the CLOSE_WAIT state for the server that had failed. In the case where a host is detected down, these connections should be closed.



 Comments   
Comment by Jeffrey Yemin [ 31/Jul/14 ]

Has anyone had a chance to test this with the 2.12 driver? Please let me know, as I'd like to resolve this issue if it's not reproducible in 2.12.3.

Regards,
Jeff

Comment by Jeffrey Yemin [ 25/Apr/14 ]

I think what you are describing is a different issue, but before filing it, I encourage you to try out the 2.12.0 driver, which contains a substantially improved replica set monitor. For a list of improvements and bug fixes, use this filter: https://jira.mongodb.org/issues/?jql=project%20%3D%20JAVA%20AND%20fixVersion%20%3D%20%222.12.0%22%20AND%20component%20%3D%20%22Cluster%20Management%22

Comment by Jason McCay [ 25/Apr/14 ]

Jeff ... thank you for your response.

By "application crashed" ... he received connection errors and his application stopped working because of failure to communicate to his replica set. So, according to his assessment ... he tested stepdown/state change events with all members of the replica set online and his application reconnected to the new primary normally.

However, if a set member was completely unavailable (offline), then his driver failed ... probably timed out because it could not talk to that member.

We have seen this behavior with other mongo drivers (for other languages) ... where the driver can handle state change, but cannot handle members falling out of the set. Everything halts while it attempts to connect to all members or it doesn't understand the error message, or something.

There was a recent regression fixed in Moped for this. Just to confirm ... if I have gotten way out of the bounds of this ticket, I will go open a new one. Just say the word.

Comment by Jeffrey Yemin [ 24/Apr/14 ]

Hi Jason,

What do you mean when you say the application crashed? Do you mean that the JVM crashed or just that an exception was thrown? Do you have a stack trace or any additional information that could shed light on the root cause?

The experience of your customer sounds quite different to what is described in the description of this issue, so I would suggest opening a new issue.

Comment by Jason McCay [ 24/Apr/14 ]

I am jumping on this ticket because this issue sounds somewhat similar to something one of our customers experienced today. The customer claimed that they had extensively tested replica set failovers and they worked properly.

However, today, one of the members of the replica set became completely unavailable (as in ... server down) and, when that happened, the driver failed to connect properly to the available member of the replica set (that had become primary) and the customer's application crashed.

Customer driver information:

<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-java-driver</artifactId>
<version>2.11.4</version>
</dependency>

Comment by Jeffrey Yemin [ 09/Apr/14 ]

Hi Chris,

Checking back with you once more before closing this ticket.

Regards,
Jeff

Comment by Jeffrey Yemin [ 27/Mar/14 ]

Hi Chris,

If I'm understanding the issue correctly, there may be a way to handle this using the new maxConnectionIdleTime setting, in which you can specify a maximum idle time for pooled connections.

A background thread, which kicks off once per minute, closes any pooled connections that have been idle longer than the specified value.

Please let me know if this satisfies your requirement.

Comment by Chris LeCompte [ 15/Feb/14 ]

Note this is with version 2.11.3 of the java driver.

Generated at Thu Feb 08 08:53:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.