[CSHARP-1457] C# driver susceptible to thread-starvation deadlocks Created: 26/Oct/15  Updated: 11/Mar/19  Resolved: 09/Dec/15

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: 2.0.1
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: ZunTzu Assignee: Robert Stam
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows



 Description   

Calling IMongoCollection<T>.InsertOneAsync from a thread of the .NET ThreadPool can lead to deadlock by thread-starvation (here is a link describing this phenomenon, see topic 'deadlock' on the page).

I suppose that the internal implementation of the C# driver relies on the .NET ThreadPool, either directly or indirectly through calls to the networking API of the .NET framework.

It could be argued that it is 'by design' and that the bug is in my code, which should not run in a thread from the .NET ThreadPool to begin with.

My answer would be that I should not have to make assumptions about the way the driver is implemented, therefore this is a case of a leaky abstraction. Furthermore, if this limitation is 'by design', the possibility of thread-starvation deadlock should be properly documented.

Here is an example of a program that exhibits the issue:

static void Main(string[] args)
{
    var client = new MongoClient("mongodb://localhost:27017");
    var database = client.GetDatabase("ThreadStarvationDeadlock");
    var collection = database.GetCollection<BsonDocument>("Foo");
 
    var tasks = new Task[500];
 
    for (int i = 0; i < tasks.Length; ++i)
    {
        tasks[i] = Task.Run(() =>
        {
            var document = new BsonDocument { { "Bar", 0 } };
            collection.InsertOneAsync(document).Wait();
        });
    }
 
    Console.WriteLine("Waiting...");
    Task.WaitAll(tasks);
 
    Console.WriteLine("Done. <Enter> to quit.");
    Console.ReadLine();
}



 Comments   
Comment by Robert Stam [ 09/Dec/15 ]

Now that we have a fully sync API in version 2.2, the driver itself never blocks a thread by calling Wait, so the the only way this could happen is if the application code itself calls Wait. But now that there is a sync API the application could just call the sync method instead of calling Wait on the Task returned by an async method.

Comment by ZunTzu [ 26/Oct/15 ]

Great suggestion. I'm confident the sync driver will fix all my problems. Thank you.

Comment by Craig Wilson [ 26/Oct/15 ]

Fair enough and I see your point. While we certainly attempt to support using the driver by calling Wait, we certainly don't recommend it for this very reason. My suggestion is to wait until we release the 2.2 driver which includes a sync stack all the way down and will not be susceptible to this issue.

I'll take a look at this line in the TcpStreamFactory and see if we can do something different.

Comment by ZunTzu [ 26/Oct/15 ]

Hi, Craig. Thank you for your great responsiveness once again.

Am I right to assume that your team sees the use of the driver in a synchronous manner - by calling Wait on all operations - as legitimate? If yes then how will your users avoid thread-starvation deadlock, seeing how pervasive the .NET thread pool has become?

The example program I gave is a toy example specially designed to illustrate the issue.

In my actual use case, MongoDB is part of the storage layer of a huge multi-threaded server-side program, over which source code I have little control. The program also runs plugins over which I have no control at all.
Therefore it is difficult to ensure that no one will try calling the storage layer from a thread pool thread.

By the way, line 101 of TcpStreamFactory.cs seems to be the one that uses the thread pool :

connectTask = Task.Factory.FromAsync(socket.BeginConnect(endPoint, null, null), socket.EndConnect);

If you were using your own thread pool this issue would simply not exist.

Comment by Craig Wilson [ 26/Oct/15 ]

Hi ZunTzu,

Thanks for the link. Internally, we do not ever block on a threadpool thread waiting for an asynchronous operation to complete. Here are some quotes pulled from the linked article:

In general, a deadlock can appear whenever a pool thread waits for an asynchronous function to finish. If we change the code so that we use the synchronous version of Connect, the problem will disappear:

If you want to avoid deadlocks in your applications, do not ever block a thread executed on the pool that is waiting for another function on the pool. This seems to be easy, but keep in mind that this rule implies two more:

Do not create any class whose synchronous methods wait for asynchronous functions, since this class could be called from a thread on the pool.

Do not use any class inside an asynchronous function if the class blocks waiting for asynchronous functions.

However, your sample above certainly suffers from this problem. Calling .Wait() inside the Task.Run method above is doing exactly what these quotes indicate should not happen. I'm not sure if the provided code is an excerpt or just a sample, but the entire <= 2.1 driver is async and you'll need to take that into account in it's usage. We are adding synchronous APIs into the 2.2 driver, and subsequently you'll be able to spawn a task and call some the synchronous InsertOne method.

I've changed your code above very lightly to make this work correctly:

tasks[i] = Task.Run(() =>
{
    var document = new BsonDocument { { "Bar", 0 } };
    return collection.InsertOneAsync(document);
});

Let me know your thoughts.
Craig

Generated at Wed Feb 07 21:39:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.