[CSHARP-1515] Queries using the Legacy API can leak connections resulting in TimeoutExceptions Created: 18/Dec/15 Updated: 16/Nov/21 Resolved: 13/Jan/16 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | Connectivity |
| Affects Version/s: | 2.2 |
| Fix Version/s: | 2.2.2 |
| Type: | Bug | Priority: | Blocker - P1 |
| Reporter: | Ben Origas | Assignee: | Robert Stam |
| Resolution: | Done | Votes: | 5 |
| Labels: | regression | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Description |
|
After upgrading from 2.0 to 2.2, I started getting System.TimeoutExceptions a few minutes after starting my app. The timeouts are occurring after 0ms, which seems like it's a problem. System.TimeoutException: Timed out waiting for a connection after 0ms. 2016-01-14: The bug is that queries using the Legacy API can fail to return a connection to the connection pool. The bug is triggered when the result set is large enough that the results are returned in multiple batches. Each time such a query is made one connection is failed to be returned to the connection pool, and once all the connections in the pool have been leaked no further queries can be made (with the symptom being that a TimeoutException is thrown from AcquireConnection). |
| Comments |
| Comment by Moshe Shperling [ 19/Jan/16 ] | ||||||||
|
ok. no problem | ||||||||
| Comment by Craig Wilson [ 19/Jan/16 ] | ||||||||
|
Moshe, This seems like a completely different issue. I've created Craig | ||||||||
| Comment by Moshe Shperling [ 19/Jan/16 ] | ||||||||
|
i call connect once per process same regarding MongoServer | ||||||||
| Comment by Craig Wilson [ 19/Jan/16 ] | ||||||||
|
Moshe, Are you calling Connect a lot? Technically, you don't even need to call Connect. Even if you are, it only needs to be called once. The reason I'm interested is because it makes me wonder about what the code looks like below this... Could you provide a snippet of code around this call and perhaps some context for it? For instance, do you get a new MongoServer and call Connect for every web request? Craig | ||||||||
| Comment by Moshe Shperling [ 19/Jan/16 ] | ||||||||
|
hi Craig | ||||||||
| Comment by Craig Wilson [ 19/Jan/16 ] | ||||||||
|
Hi Moshe, First off, please remember this is a public ticket and your stack trace contained your server information. I've removed it. The stack trace you posted above is telling us it cannot talk to the server in your connection string. That's what the Disconnected means above. Please ensure that your hostname and ports are correct and that you have network connectivity to this box. One way to check this is to use the mongo shell. Craig | ||||||||
| Comment by Moshe Shperling [ 19/Jan/16 ] | ||||||||
|
here is what we add to connection string connecttimeout=300000;maxpoolsize=800;waitQueueSize=2000;waitQueueTimeout=5m. but there are still bottlenecks, it did not happen in the old legacy driver | ||||||||
| Comment by Moshe Shperling [ 19/Jan/16 ] | ||||||||
|
hi }, LatencyLimitingSe }. Client view of cluster ", EndPoint: "Unspecified/host/port", State: "Disconnected", Type: "Unknown" }] }. | ||||||||
| Comment by Githook User [ 14/Jan/16 ] | ||||||||
|
Author: {u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}Message: | ||||||||
| Comment by Githook User [ 14/Jan/16 ] | ||||||||
|
Author: {u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}Message: | ||||||||
| Comment by Githook User [ 13/Jan/16 ] | ||||||||
|
Author: {u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}Message: | ||||||||
| Comment by Githook User [ 13/Jan/16 ] | ||||||||
|
Author: {u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}Message: | ||||||||
| Comment by Craig Wilson [ 13/Jan/16 ] | ||||||||
|
Our release notes document what was fixed and what new features were added. You can see those here: https://github.com/mongodb/mongo-csharp-driver/releases/tag/v2.2.0. | ||||||||
| Comment by Moshe Shperling [ 13/Jan/16 ] | ||||||||
|
ok. | ||||||||
| Comment by Craig Wilson [ 13/Jan/16 ] | ||||||||
|
You can get the dll's from the same place you got the 2.2 version. Either from github releases or from nuget. It supports server 3.2, but not any new features. You would absolutely want to test your application before deploying to production. | ||||||||
| Comment by Moshe Shperling [ 13/Jan/16 ] | ||||||||
|
ok. | ||||||||
| Comment by Craig Wilson [ 13/Jan/16 ] | ||||||||
|
Yes. Every 2.x release supports the legacy API. | ||||||||
| Comment by Moshe Shperling [ 13/Jan/16 ] | ||||||||
|
our flow is using legacy api. does 2.1 support it? | ||||||||
| Comment by Craig Wilson [ 13/Jan/16 ] | ||||||||
|
moshe@imonlinegroup.com, can you elaborate on why you can't downgrade? You mentioned SCRAM-SHA-1, but that is supported in driver version 2.1. Which feature is preventing the downgrade? | ||||||||
| Comment by Moshe Shperling [ 13/Jan/16 ] | ||||||||
|
i am glad you managed to reproduce it, and hope you will provide us with fix soon. we need it badly, as i told u we cannot downgrade. thanks. | ||||||||
| Comment by Robert Stam [ 13/Jan/16 ] | ||||||||
|
I have been able to reproduce this. In order to reproduce this I had to increase the size of the collection my test program was reading so that the number of documents being returned was more than one fit in a single batch. | ||||||||
| Comment by Robert Stam [ 13/Jan/16 ] | ||||||||
|
What would help the most is a small self contained application that reproduces the problem. Then I can run that locally and figure out what is happening. | ||||||||
| Comment by Moshe Shperling [ 13/Jan/16 ] | ||||||||
|
hi Robert | ||||||||
| Comment by Robert Stam [ 13/Jan/16 ] | ||||||||
|
I have not been able to reproduce this. I am using a test program with 400 threads. Each thread repeatedly reads the entire contents of a collection in an infinite loop. Using the default connection pool settings of 100 connections and a wait queue size of 500, my test program would have 100 threads actively reading at one time while the other 300 threads are waiting to acquire a connection. Since 300 < 500 I would not expect any WaitQueueFullExceptions. My test program has been running overnight without throwing a single TimeoutException. I am using version 2.2.1 of the driver and using the Legacy API. | ||||||||
| Comment by Moshe Shperling [ 13/Jan/16 ] | ||||||||
|
hi Robert we are still getting this error. did u manage to reproduce it. please assist here. | ||||||||
| Comment by Moshe Shperling [ 12/Jan/16 ] | ||||||||
|
Robert your fixes to connection string solved it for me currently. i will keep an eye on it. thanks | ||||||||
| Comment by Robert Stam [ 12/Jan/16 ] | ||||||||
|
2.3 is not out. The current version is 2.2.1. We are trying to reproduce this but have not yet been able to. | ||||||||
| Comment by Moshe Shperling [ 12/Jan/16 ] | ||||||||
|
hi
it breaks our system and we cannot downgrade since all our users use SCRAM-SHA-1 for auth and our coded uses legacy and the old legacy supports only old auth. please help it is very URGENT! | ||||||||
| Comment by Robert Stam [ 12/Jan/16 ] | ||||||||
|
It's difficult to tell from the stack trace whether this is a bug or simply a result of a heavy load running on top of a connection pool that is too small to support that load. The default connection pool size is 100 connections, and the default size of the wait queue (tasks/threads that are waiting for a connection to be released back into the connection pool) is 500. In a heavily loaded system it is possible for there to be more tasks/threads than there are connections, and in that scenario it is possible for tasks/threads to get a TimeoutException while waiting for a connection to be release back to the connection pool. Can you try configuring a larger connection pool? And/or a larger wait queue size and wait queue timeout? That would help determine whether this exception is simply the result of a heavy load, or whether there is a bug related to the connection pooling. These settings can all be configured on the connection string:
The values shown in the example above are the default values. | ||||||||
| Comment by Robert Stam [ 12/Jan/16 ] | ||||||||
|
The "after 0ms" in the error message is a red herring. There is a bug in the error message. See | ||||||||
| Comment by Seva Feldman [ 12/Jan/16 ] | ||||||||
|
Same issue . I understand 2.2.1 didn't solve the problem. | ||||||||
| Comment by Kevin Fairs [ 12/Jan/16 ] | ||||||||
|
We have hit this exact same issue with 2.2, rolling back to 2.1 works for now, but our upgrade to server 3.2 is now on hold until this is resolved. Is there any timescale for resolution? Thanks. | ||||||||
| Comment by Ben Origas [ 08/Jan/16 ] | ||||||||
|
Have you had a chance to investigate this at all? I'd like to update the server to 3.2, but can't until we figure this out so I can run the 2.2 driver. | ||||||||
| Comment by Craig Wilson [ 23/Dec/15 ] | ||||||||
|
Hi Ben, There was a large-ish change in which we added a sync stack down through Core and all the legacy API, which you are using, goes through that. It's possible there is some type of connection leak going on. We'll take a look as soon as we get back. For now, I'd recommend downgrading to either 2.1 or back to 2.0. Craig | ||||||||
| Comment by Ben Origas [ 23/Dec/15 ] | ||||||||
|
Any ideas? | ||||||||
| Comment by Ben Origas [ 18/Dec/15 ] | ||||||||
We then hold onto _mongoDatabase and use it for the lifetime of the app. It seems to work for the 1st few requests, but after 5+ min, we start getting timeouts | ||||||||
| Comment by Craig Wilson [ 18/Dec/15 ] | ||||||||
|
Could you provide your connection string and any setup code you have? |