[CSHARP-2689] Occasional extremely poor performance when connection pool is exhausted Created: 14/Aug/19  Updated: 27/Oct/23  Resolved: 18/Nov/19

Status: Closed
Project: C# Driver
Component/s: Performance
Affects Version/s: 2.8.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Anders Theodorsen Assignee: Robert Stam
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 10 Version 1703
.NET Framework 4.6.1 & .NET Core 2.2
MongoDB 4.0.5 & 4.0.9


Attachments: Zip Archive MongoDbPerfIssue.zip    

 Description   

Issue

We use use the C# MongoDB driver in an ASP .NET Core application and experience occasionally extremely poor performance. Some requests take 80-90 seconds, where they normally take ~1 seconds. This problem is triggered there is issued a large number of parallel tasks which fetches data from MongoDB. The data is fetched using the Async API.

Repro & Test Results

I've put together a minimal repro example illustrating the problem. The Visual Studio solution is attached on this ticket.

The issue occurs when the following is run:

async Task Fetch()

Unknown macro: {   await collection.Find(filter).ToListAsync();   await collection.Aggregate<BsonDocument>(pipeline).ToListAsync(); }

var tasks = Enumerable.Range(0, count).Select(i => Fetch()).ToList();
await Task.WhenAll(tasks);

The filter is simply for a single _id and the pipeline is a match for the same _id. The collection contains a single document with the _id requested. The 'count' variable is used to test various number in parallel. Strangely the problem goes away if I swap the order of the Find & Aggregate queries, or comment out one of them.

When this code is run it starts to run extremely slow with a count of ~200-250 tasks. However it appears to run slow only the first time. Here is some run times with various counts, each time time the application is restarted and the queries are run two times:

Run 1. Fetched 150 in 606 ms
Run 2. Fetched 150 in 192 ms

Run 1. Fetched 200 in 46 801 ms
Run 2. Fetched 200 in 606 ms

Run 1. Fetched 250 in 68 233 ms
Run 2. Fetched 250 in 1 667 ms

Run 1. Fetched 20 in 359 ms
Run 2. Fetched 2500 in 2 843 ms

At ~200 the first parallel queries take over 46 seconds. However if the first query has few in parallel, the second handles a lot more (Like the last example above, where running 2500 i parallel finishes in only 2.8 seconds). When the queries run fast (the first example above), I see in Windows Task Manager the number of threads for the application staying around ~30, when the problem occurs the number of threads increases by 1 for every second. So to me, when the problem occurs it appears that even though I'm using the Async methods, something is blocking causing new threads to constantly be spawned.

If the problem only would occur at the first request it would not be such a big deal, however in our ASP .NET Core application we see the problem not only for the first request, but occasionally (The IMongoClient & IMongoDatabase are singletons).

Temporary Fix

Increasing the MaxConnectionPool of the MongoClient to f.ex. 1000 makes the problem disappear. However I think this is a band-aid fix, just masking the problem.

Sidenote

I suspect this issue is somewhat related to: https://jira.mongodb.org/browse/CSHARP-1895

When I change the fetch function in the repro example to use the Sync API:

async Task Fetch()
{
  await Task.Run(() =>
 

Unknown macro: {     collection.Find(filter).ToList();     collection.Aggregate<BsonDocument>(pipeline).ToList();   }

);
}

I get the following example after 30 seconds:

System.TimeoutException: 'A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector

Unknown macro: { AllowedLatencyRange = 00}

}. Client view of cluster state is { ClusterId : "1", ConnectionMode : "Automatic", Type : "Unknown", State : "Disconnected", Servers : [{ ServerId: "

Unknown macro: { ClusterId }

", EndPoint: "Unspecified/localhost:27017", State: "Disconnected", Type: "Unknown", LastUpdateTimestamp: "2019-08-14T09:37:22.4361867Z" }] }.

Which is the same exception seen in 1895.

 

 

 



 Comments   
Comment by Robert Stam [ 18/Nov/19 ]

When writing an async program you need to be careful to be 100% async. If any of your code blocks it will end up affecting performance.

> But is it really necessary to do two awaits to get the result for Aggregate asynchronously?

Yes it is. The call to `AggregateAsync` returns a cursor, and the call to `ToListAsync` reads everything up until the end of the cursor and returns the results as a List.

If the query returns large amounts of data instead of calling `ToListAsync` to read all the data at once, you would want to work with the cursor directly to process a batch of results at a time.

>As a side note, the documentation (https://mongodb.github.io/mongo-csharp-driver/2.8/reference/driver/crud/reading/#aggregation) also uses Aggregate().ToListAsync() as an async example which then can block in certain cases.

There are two forms of the Aggregate methods:

  1. The Aggregate(pipeline) and the AggregateAsync(pipeline) methods which return a cursor
  2. The Aggregate() method which returns a fluent interface for building an aggregation pipeline

The example you referred to is using the second form.

Comment by Anders Theodorsen [ 14/Aug/19 ]

As a side note, the documentation (https://mongodb.github.io/mongo-csharp-driver/2.8/reference/driver/crud/reading/#aggregation) also uses Aggregate().ToListAsync() as an async example which then can block in certain cases.

Comment by Anders Theodorsen [ 14/Aug/19 ]

D'oh, it appears this problem is due to misuse on our side.

I downloaded the source code for the Mongo C# driver and debugged. The problem occured in ExclusiveConnectionPool.cs:

public bool Wait(TimeSpan timeout, CancellationToken cancellationToken)
{
  return _semaphore.Wait(timeout, cancellationToken);
} 

This causes the thread to block explaining the behaviour, so I traced the stack and it traces back to the .Aggregate() method, I didn't realize there is a AggregateAsync, and didn't think there was a need for it since .ToListAsync() is called after it at the end.

So changing the 'Fetch' function in the sample to the following solves the issue:

async Task Fetch()
{
  await collection.Find(filter).ToListAsync();
  await (await collection.AggregateAsync<BsonDocument>(pipeline)).ToListAsync();
}
 

But is it really necessary to do two awaits to get the result for Aggregate asynchronously?

I also see the 'Find' method has a 'FindAsync' equivalent. When using the 'ToListAsync' on them, is it still necessary/recommended to use the 'FindAsync'? (And then double awaiting). As I haven't been able to see/reproduce the issue with only 'Find().ToListAsync()' type queries.

Generated at Wed Feb 07 21:43:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.