[CSHARP-2648] Connection Reset By Peer - with driver 2.8.0 and mongo 4.0.9 on a k8s cluster Created: 24/Jun/19 Updated: 20/Jul/20 Resolved: 20/Jul/20 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | Connectivity |
| Affects Version/s: | 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Alok Kumar | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
AWS EKS, Ubuntu base images and Ubuntu host for clients, Mongo 4.0.9 docker image (replica set) hosted on AWS EKS - same cluster as the clients. |
||
| Description |
|
We have been getting "Connection Reset by Peer" mongo errors in our setup. A description of the setup:
We get these errors. We observed that if there is a series of calls, say 500 calls to do a key based select, there is no issue. Then we pause for 5 minutes, and repeat the test, the first time we get a "Connection Reset by Peer". Later, the test continues. This happens every time after pause. This condition repeats with real users behavior, there may be spurts of activity and then a lull. As a consequence we keep getting "Connection reset by peer" at critical parts in the business workflow. On the client side, the solution is to perform defensive coding and repeat the call, but that's a change in many places. Other combinations attempted:
However no change in the behavior. It appears to us that while the TCP connection is closed on the server side, the client still thinks that it's a valid connection and attempts to use it, leading to this error. Has anybody else faced such a situation? Any suggestions would be appreciated, happy to provide more information if needed. |
| Comments |
| Comment by Jeffrey Yemin [ 20/Jul/20 ] |
|
Sorry for losing track of this. Do you have a full stack trace available? I'm surprised this would happen after a pause given that idle connections in the pool should have been pruned in that 5 minute interval. Also, as this sounds more like a support issue, I wanted to give you some other resources to get this question answered more quickly:
I'm going to close this now, but happy to re-open if you have more information. |
| Comment by Riaz Ahmad [ 23/Jul/19 ] |
|
This issue looks to be related to |
| Comment by Alok Kumar [ 25/Jun/19 ] |
|
I need to make a correction to this issue report.
Initial settings were
We used to get connection reset by peer errors but not many. However they existed.
Subsequent settings were
After this change, the connection reset by peer errors have increased.
Now, we have changed these settings to the following values;
The errors have now dropped significantly. With larger connection lifetimes, the error count is higher. I could not find a way to edit the original issue, if someone can edit that for me it would be appreciated.
|