[CSHARP-1515] Queries using the Legacy API can leak connections resulting in TimeoutExceptions Created: 18/Dec/15  Updated: 16/Nov/21  Resolved: 13/Jan/16

Status: Closed
Project: C# Driver
Component/s: Connectivity
Affects Version/s: 2.2
Fix Version/s: 2.2.2

Type: Bug Priority: Blocker - P1
Reporter: Ben Origas Assignee: Robert Stam
Resolution: Done Votes: 5
Labels: regression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates CSHARP-1528 The message of the TimeoutException t... Closed
is duplicated by CSHARP-1435 System.TimeoutException: A timeout oc... Closed
is duplicated by CSHARP-1532 ConnectionPool death Closed
Related
related to CSHARP-1536 Cursor do not get closed Closed
related to CSHARP-1538 Legacy Driver Connect() times out. Closed

 Description   

After upgrading from 2.0 to 2.2, I started getting System.TimeoutExceptions a few minutes after starting my app. The timeouts are occurring after 0ms, which seems like it's a problem.

System.TimeoutException: Timed out waiting for a connection after 0ms.
at MongoDB.Driver.Core.ConnectionPools.ExclusiveConnectionPool.AcquireConnectionHelper.EnteredPool(Boolean enteredPool)
at MongoDB.Driver.Core.ConnectionPools.ExclusiveConnectionPool.AcquireConnection(CancellationToken cancellationToken)
at MongoDB.Driver.Core.Servers.ClusterableServer.GetChannel(CancellationToken cancellationToken)
at MongoDB.Driver.Core.Operations.FindOperation`1.Execute(IReadBinding binding, CancellationToken cancellationToken)
at MongoDB.Driver.MongoCursor`1.GetEnumerator()

2016-01-14:

The bug is that queries using the Legacy API can fail to return a connection to the connection pool. The bug is triggered when the result set is large enough that the results are returned in multiple batches. Each time such a query is made one connection is failed to be returned to the connection pool, and once all the connections in the pool have been leaked no further queries can be made (with the symptom being that a TimeoutException is thrown from AcquireConnection).



 Comments   
Comment by Moshe Shperling [ 19/Jan/16 ]

ok. no problem

Comment by Craig Wilson [ 19/Jan/16 ]

Moshe,

This seems like a completely different issue. I've created CSHARP-1538 to work through this. I'll be asking a questions on the new ticket, so please check and send your answers there.

Craig

Comment by Moshe Shperling [ 19/Jan/16 ]

i call connect once per process same regarding MongoServer

Comment by Craig Wilson [ 19/Jan/16 ]

Moshe,

Are you calling Connect a lot? Technically, you don't even need to call Connect. Even if you are, it only needs to be called once. The reason I'm interested is because it makes me wonder about what the code looks like below this... Could you provide a snippet of code around this call and perhaps some context for it? For instance, do you get a new MongoServer and call Connect for every web request?

Craig

Comment by Moshe Shperling [ 19/Jan/16 ]

hi Craig
i am speaking about working environment , which connects successfully 99% of the time, but once in a while we have this exception i sent you.

Comment by Craig Wilson [ 19/Jan/16 ]

Hi Moshe,

First off, please remember this is a public ticket and your stack trace contained your server information. I've removed it.

The stack trace you posted above is telling us it cannot talk to the server in your connection string. That's what the Disconnected means above. Please ensure that your hostname and ports are correct and that you have network connectivity to this box. One way to check this is to use the mongo shell.

Craig

Comment by Moshe Shperling [ 19/Jan/16 ]

here is what we add to connection string connecttimeout=300000;maxpoolsize=800;waitQueueSize=2000;waitQueueTimeout=5m.

but there are still bottlenecks, it did not happen in the old legacy driver

Comment by Moshe Shperling [ 19/Jan/16 ]

hi
we have upgraded to 2.2.2 and even though we see the improvement, we still have connection timeout issues. here is the stacktrace:
Unhandled Exception: System.TimeoutException: A timeout occured after 30000ms se
lecting a server using CompositeServerSelector{ Selectors = ReadPreferenceServer
Selector{ ReadPreference =

{ Mode = Primary, TagSets = [] }

}, LatencyLimitingSe
rverSelector

{ AllowedLatencyRange = 00:00:00.0150000 }

}. Client view of cluster
state is { ClusterId : "1", ConnectionMode : "Automatic", Type : "Unknown", Sta
te : "Disconnected", Servers : [{ ServerId: "

{ ClusterId : 1, EndPoint : "Unspec ified/host:port" }

", EndPoint: "Unspecified/host/port", State: "Disconnected", Type: "Unknown" }] }.
at MongoDB.Driver.Core.Clusters.Cluster.ThrowTimeoutException(IServerSelector
selector, ClusterDescription description)
at MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedHelper.Handl
eCompletedTask(Task completedTask)
at MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChanged(IServerSele
ctor selector, ClusterDescription description, Task descriptionChangedTask, Time
Span timeout, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Clusters.Cluster.SelectServer(IServerSelector selector
, CancellationToken cancellationToken)
at MongoDB.Driver.MongoServer.Connect(TimeSpan timeout)
at MongoDB.Driver.MongoServer.Connect()

Comment by Githook User [ 14/Jan/16 ]

Author:

{u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}

Message: CSHARP-1515: Remove unnecessary classes.
Branch: v2.2.x
https://github.com/mongodb/mongo-csharp-driver/commit/b62b11e3a7f0a33831dd07018d5ff2fe126597c7

Comment by Githook User [ 14/Jan/16 ]

Author:

{u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}

Message: CSHARP-1515: Dispose of cursor in AsyncCursorEnumeratorAdapter.
Branch: v2.2.x
https://github.com/mongodb/mongo-csharp-driver/commit/ed71e559e14b791d290acae86d0d254a4e753a87

Comment by Githook User [ 13/Jan/16 ]

Author:

{u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}

Message: CSHARP-1515: Remove unnecessary classes.
Branch: master
https://github.com/mongodb/mongo-csharp-driver/commit/482ea3d09cdcabed4ea06b2f8790cc9ce89ada93

Comment by Githook User [ 13/Jan/16 ]

Author:

{u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}

Message: CSHARP-1515: Dispose of cursor in AsyncCursorEnumeratorAdapter.
Branch: master
https://github.com/mongodb/mongo-csharp-driver/commit/38590103ae76d9ea5f5398b1e7648f6ba7760fb2

Comment by Craig Wilson [ 13/Jan/16 ]

Our release notes document what was fixed and what new features were added. You can see those here: https://github.com/mongodb/mongo-csharp-driver/releases/tag/v2.2.0.

Comment by Moshe Shperling [ 13/Jan/16 ]

ok.
last thing: can u refer of any known issues with that version?

Comment by Craig Wilson [ 13/Jan/16 ]

You can get the dll's from the same place you got the 2.2 version. Either from github releases or from nuget.

It supports server 3.2, but not any new features. You would absolutely want to test your application before deploying to production.

Comment by Moshe Shperling [ 13/Jan/16 ]

ok.
how can i get the dll?
does it support mongo 3.2?

Comment by Craig Wilson [ 13/Jan/16 ]

Yes. Every 2.x release supports the legacy API.

Comment by Moshe Shperling [ 13/Jan/16 ]

our flow is using legacy api. does 2.1 support it?

Comment by Craig Wilson [ 13/Jan/16 ]

moshe@imonlinegroup.com, can you elaborate on why you can't downgrade? You mentioned SCRAM-SHA-1, but that is supported in driver version 2.1. Which feature is preventing the downgrade?

Comment by Moshe Shperling [ 13/Jan/16 ]

i am glad you managed to reproduce it, and hope you will provide us with fix soon. we need it badly, as i told u we cannot downgrade.

thanks.

Comment by Robert Stam [ 13/Jan/16 ]

I have been able to reproduce this.

In order to reproduce this I had to increase the size of the collection my test program was reading so that the number of documents being returned was more than one fit in a single batch.

Comment by Robert Stam [ 13/Jan/16 ]

What would help the most is a small self contained application that reproduces the problem.

Then I can run that locally and figure out what is happening.

Comment by Moshe Shperling [ 13/Jan/16 ]

hi Robert
What can i do for you to help you reproducing it? i can give stack trace, screen shot, team viewer... what will help you?

Comment by Robert Stam [ 13/Jan/16 ]

I have not been able to reproduce this. I am using a test program with 400 threads. Each thread repeatedly reads the entire contents of a collection in an infinite loop.

Using the default connection pool settings of 100 connections and a wait queue size of 500, my test program would have 100 threads actively reading at one time while the other 300 threads are waiting to acquire a connection. Since 300 < 500 I would not expect any WaitQueueFullExceptions.

My test program has been running overnight without throwing a single TimeoutException.

I am using version 2.2.1 of the driver and using the Legacy API.

Comment by Moshe Shperling [ 13/Jan/16 ]

hi Robert we are still getting this error. did u manage to reproduce it. please assist here.

Comment by Moshe Shperling [ 12/Jan/16 ]

Robert your fixes to connection string solved it for me currently. i will keep an eye on it.

thanks

Comment by Robert Stam [ 12/Jan/16 ]

2.3 is not out. The current version is 2.2.1.

We are trying to reproduce this but have not yet been able to.

Comment by Moshe Shperling [ 12/Jan/16 ]

hi
i also have the same issue. my question is: i saw that there is 2.3 out

  • does it have it fixed?
  • how stable it is?
  • where can i get it?

it breaks our system and we cannot downgrade since all our users use SCRAM-SHA-1 for auth and our coded uses legacy and the old legacy supports only old auth.

please help it is very URGENT!

Comment by Robert Stam [ 12/Jan/16 ]

It's difficult to tell from the stack trace whether this is a bug or simply a result of a heavy load running on top of a connection pool that is too small to support that load.

The default connection pool size is 100 connections, and the default size of the wait queue (tasks/threads that are waiting for a connection to be released back into the connection pool) is 500.

In a heavily loaded system it is possible for there to be more tasks/threads than there are connections, and in that scenario it is possible for tasks/threads to get a TimeoutException while waiting for a connection to be release back to the connection pool.

Can you try configuring a larger connection pool? And/or a larger wait queue size and wait queue timeout? That would help determine whether this exception is simply the result of a heavy load, or whether there is a bug related to the connection pooling.

These settings can all be configured on the connection string:

mongodb://hostnames/?maxPoolSize=100;waitQueueSize=500;waitQueueTimeout=2m

The values shown in the example above are the default values.

Comment by Robert Stam [ 12/Jan/16 ]

The "after 0ms" in the error message is a red herring. There is a bug in the error message.

See CSHARP-1528.

Comment by Seva Feldman [ 12/Jan/16 ]

Same issue . I understand 2.2.1 didn't solve the problem.

Comment by Kevin Fairs [ 12/Jan/16 ]

We have hit this exact same issue with 2.2, rolling back to 2.1 works for now, but our upgrade to server 3.2 is now on hold until this is resolved. Is there any timescale for resolution? Thanks.

Comment by Ben Origas [ 08/Jan/16 ]

Have you had a chance to investigate this at all? I'd like to update the server to 3.2, but can't until we figure this out so I can run the 2.2 driver.

Comment by Craig Wilson [ 23/Dec/15 ]

Hi Ben,

There was a large-ish change in which we added a sync stack down through Core and all the legacy API, which you are using, goes through that. It's possible there is some type of connection leak going on. We'll take a look as soon as we get back. For now, I'd recommend downgrading to either 2.1 or back to 2.0.

Craig

Comment by Ben Origas [ 23/Dec/15 ]

Any ideas?

Comment by Ben Origas [ 18/Dec/15 ]

var connectionString = @"mongodb://fake-server01.ng.com:27017,fake-server02.ng.com:27017/?replicaSet=rs0";
_mongoClient = new MongoClient(connectionString);
_mongoServer = _mongoClient.GetServer();
 
var conventionPack = new ConventionPack() { new IgnoreExtraElementsConvention(true) };
ConventionRegistry.Register("IgnoreExtraElements", conventionPack, type => true);
 
_mongoDatabase = _mongoServer.GetDatabase(dbName);

We then hold onto _mongoDatabase and use it for the lifetime of the app. It seems to work for the 1st few requests, but after 5+ min, we start getting timeouts

Comment by Craig Wilson [ 18/Dec/15 ]

Could you provide your connection string and any setup code you have?

Generated at Wed Feb 07 21:39:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.