[CSHARP-1532] ConnectionPool death Created: 13/Jan/16 Updated: 20/Jan/16 Resolved: 14/Jan/16 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | Connectivity |
| Affects Version/s: | 2.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Bret Ferrier | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | c#, connection, driver | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
.net 4.5.2 Azure Websites |
||
| Issue Links: |
|
||||||||
| Description |
|
So I have been having an intermittent issue where the app is no longer able to connect to the database. I first increased the number of available connections in the connection pool for a Medium size Azure Website (2gigs of memory) to 400 which decreased the frequency and mostly made the error go away. I opened a help ticket with Microsoft and was able to track the error down to running out of memory on one of 3 VMs that are running the site. After looking over my code it appears that the error is being caused by the C# driver not recovering once the connection pool gets in a bad state. Whenever the traffic grows to my site new VMs will automatically be added by Azure and I often have about 6 VMs serving the site. When an error occurs connecting to Mongo it will only occur on one server and that server will become unresponsive and the error below is what shows up in the error logs. It appears that after some amount of time the ConnectionPool can get in a bad state and that it never recovers until the site is restarted. Error Message: System.TimeoutException: Timed out waiting for a connection after 0ms. |
| Comments |
| Comment by ravindra [ 20/Jan/16 ] |
|
Yes I did upgrade during last week of December and deployed on the servers last Friday and hence the issue. Thanks for the quick patch. |
| Comment by Bret Ferrier [ 20/Jan/16 ] |
|
This is also a regression in the newer drivers it seems that weren't out 6 months ago. So at some point you must have upgraded your drivers. |
| Comment by Craig Wilson [ 19/Jan/16 ] |
|
Presumably, your load got a little higher last week. This is reproducible by simply having a higher load. |
| Comment by ravindra [ 19/Jan/16 ] |
|
ok thanks just saw that. Now any way to reproduce this issue, reason behind that is as I said we have been running in production for 6 months and never faced this issue and it happened twice in last 1 week. Regards |
| Comment by Craig Wilson [ 19/Jan/16 ] |
|
Ravindra, It has been released already. You can get it from nuget or github. Craig |
| Comment by ravindra [ 19/Jan/16 ] |
|
We are running into the same issue, however our site is using mongoDB for the past 6 months with legacy API , but seen this issue for the first time on Saturday and again Today. Recycling the web server process (w3wp.exe) fixes it but that seems to be a temporary solution. When can we expect a patch on this. Ravi |
| Comment by Bret Ferrier [ 14/Jan/16 ] |
|
Ok, hopefully that will go out to Nuget soon. |
| Comment by Craig Wilson [ 14/Jan/16 ] |
|
Hi Bret, Since you have some legacy API involved, I believe the problem is a duplicate of Craig |
| Comment by Bret Ferrier [ 14/Jan/16 ] |
|
Craig, our code base contains calls that utilize both the legacy and the new API. With that being said the "hot" paths that are called most often have been converted to use the Async version of the new API and that is where I am seeing the errors as 95% of the traffic is for the same few pages that mainly use the Async version (You can see in the call stack that it is calling ToListAsync. I have tried to get a mini/full dump of the server when it was in an error state but don't have the same kind of access as its a Azure Website and there wasn't enough memory on the machine left to do a dump. It really seems like the ConnectionPool gets poisoned and never recovers, new servers are spun up but the one is a lame duck. |
| Comment by Craig Wilson [ 13/Jan/16 ] |
|
Hi Bret, Are you using the legacy API or the new API? And are you using the sync or async versions? Craig |