[CSHARP-3477] Potential deadlock in ExclusiveConnectionPool (MaintainSizeAsync) Created: 15/Mar/21  Updated: 27/Oct/23  Resolved: 09/Apr/21

Status: Closed
Project: C# Driver
Component/s: API
Affects Version/s: 2.12.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Raphael Rabl Assignee: James Kovacs
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

TargetFramework: .NET 5
Project type: Microsoft.NET.Sdk.Worker
OS: Windows 10
MongoDB: Community Edition 4.4.3 2008R2Plus SLL (64 bit) - installed as Windows Service



 Description   

Application info:

  • TargetFramework: .NET 5
  • Project type: Microsoft.NET.Sdk.Worker

For a few days now, my application sometimes hangs (deadlock?) after an await call to FindAsync of an IMongoCollection.

For example:

var testCursor = await testCollection.FindAsync(_ => true);

the same also happens for:

var testCursor = await testCollection.FindAsync(_ => true).ConfigureAwait(false);

Any line after this call is never reached.

The problem might happen because I use the API in a wrong way somewhere, but I wasn't able to recreate a minimal code example to reproduce the issue yet.

However, the deadlock always happens at this line: https://github.com/mongodb/mongo-csharp-driver/blob/51b48437814cd6d1f4882d1ca2387b1539c8f01f/src/MongoDB.Driver.Core/Core/ConnectionPools/ExclusiveConnectionPool.cs#L227

MaintainSizeAsync().ConfigureAwait(false);

I don't know if that line is not using await on purpose, but I figured it might be an oversight and I post what I found anyway to make sure it is at least checked by a developer who knows more about the code.

If I manage to create a minimal reproducible example, I will let you know.



 Comments   
Comment by Backlog - Core Eng Program Management Team [ 09/Apr/21 ]

There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please feel free to comment on this if you're able to provide more information.

Comment by James Kovacs [ 25/Mar/21 ]

Hi, Raphael,

We have taken a look at the MaintainSizeAsync code more closely. What you observe is expected behaviour. We are starting a background task to periodically prune dead connections and refill the pool to its configured minimum size. We do not await the task because it is never returns until the cancellation token is signalled by the disposal of the connection pool.

If the code after FindAsync is never called, that indicates that the async task never completes. In order to diagnose this further, we either need a self-contained repro and/or we need to analyze the stack traces and memory in your application when it hangs.

The self-contained repro is self-explanatory, but it can sometimes be hard to reproduce sporadic events such as this. The second option is collecting a process dump the next time your application hangs and uploading it to us for analysis. The next time your application hangs, please perform the following:

1. Install dotnet-dump:

dotnet tool install --global dotnet-dump

2. Collect three or more process dumps over than span of a few minutes:

dotnet dump collect -p PID --type Full

3. Attach all generated dump files (typically ./core_* }} (Linux) or {{./dump_*.dmp (Windows)) to this ticket.

Step 2 is important as it allows us to compare the states of threads and locks over a span of time. If we only have a single dump file it can be difficult to determine if a thread just happened to be acquiring a lock when the dump was taken or if the thread has been blocked for a significant period of time attempting to acquire that lock.

Note that the dump files will contain the compressed process memory with any potentially sensitive data such as credentials or user data. If this is a concern, please contact us via this ticket and we will create a secure upload site that you can use to upload the dump files.

Please let us know if you have any questions.

Sincerely,
James

Comment by Mikalai Mazurenka (Inactive) [ 17/Mar/21 ]

Thanks inuriasx@gmail.com for your report, we will need some time to investigate it and then we will come back to you.

Generated at Wed Feb 07 21:45:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.