[CSHARP-454] "Deadlock" in connection pool management Created: 25/Apr/12  Updated: 02/Apr/15  Resolved: 26/Apr/12

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: 1.4.1
Fix Version/s: 1.4.2

Type: Bug Priority: Major - P3
Reporter: Aristarkh Zagorodnikov Assignee: Robert Stam
Resolution: Done Votes: 0
Labels: c#, connections, deadlock, driver
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible

 Description   

Recently, we started getting occasional unexplainable "hangs" that coincided with hitting the connection pool upper limit. I spent some time debugging, but could not repeat the case on test machine, so I waited till one of our servers "hanged", then dumped app server process and loaded it into WinDBG.
I'll spare you the details, but the culprit in our case lies in MongoServer.RequestStart method. The problem is that RequestStart locks on _serverLock, then proceeds to call MongoServerInstance.AcquireConnection, which, in it's turn, calls MongoConnectionPool.AcquireConnection. When you've already hit the connection pool limit, MongoConnectionPool.AcquireConnection starts waiting on _connectionPoolLock with a timeout (wait queue). Too bad, the MongoServer.ReleaseConnection() locks on MongoServer._serverLock, so no connections can be released back, which leads to connection management being stalled for WaitQueueTimeout.
Other suspicious methods (that access connection pool to acquire connections) include MongoServer.VerifyState and MongoServer.ChooseServerInstance(due to it's call to MongoServer.VerifyUnknownState). Take not that while I think they may contain similar locking pattern, I'm not exactly sure and not yet observed problems related to these two methods (although VerifyState certainly looks like it has the same problem).

I would like to note that this is a very disrupting issue, because sooner or later it brings down any server that is approaching a certain load. The most obvious fix is increasing connection pool limit, and it appears to solve the issue, but it doesn't feel like a proper long-term solution.



 Comments   
Comment by Aristarkh Zagorodnikov [ 27/Apr/12 ]

Good to hear, waiting for 1.4.2 release =)

Comment by Robert Stam [ 26/Apr/12 ]

This should be fixed now. Changes include:

1. RequestStart/Done now release the lock before calling out to other methods
2. Ping and VerifyState now use a new connection instead of one from the connection pool

Using a new connection for Ping and VerifyState prevents these methods from being stalled when the connection pool is oversubscribed. Opening and closing a connection for just this purpose is not too much overhead because it's only done every few seconds (every 10 seconds at the moment).

There are also minor changes to MongoConnection reflecting the fact that we can now have a connection that is not part of the connection pool.

Comment by Robert Stam [ 26/Apr/12 ]

Thanks for reporting this. We're working on it.

Comment by Aristarkh Zagorodnikov [ 26/Apr/12 ]

Also, it appears that this problem was there for some time, it just became more visible since CSHARP-408 was implemented.

Generated at Wed Feb 07 21:36:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.