[CSHARP-1619] Time outs when tryng to connect to mongoDb linux Created: 04/Apr/16 Updated: 12/Dec/18 Resolved: 18/Aug/16 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | Connectivity |
| Affects Version/s: | 2.2.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker - P1 |
| Reporter: | shai bar | Assignee: | Craig Wilson |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
linux server |
||
| Description |
|
we are trying to connect to the mongo server using a .net driver and getting this message ""The wait queue for acquiring a connection to server 172.168.12.100:27017 is full." the exception |
| Comments |
| Comment by Robin Munn [ 06/Sep/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Andrew Stewart is correct; this is a bug in Mono 3.12. I added a comment to In Mono 4.0, they switched to using the Microsoft reference implementation for many classes, including SemaphoreSlim. And since the Microsoft reference implementation doesn't suffer from this particular race-condition bug, you'll see no more spurious MongoWaitQueueFullExceptions being thrown if you just upgrade to Mono 4.0. If upgrading Mono is not an option for you, though, I'm not sure what the best solution is. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrew Stewart [ 07/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hmm, I looked a bit more into this, and it seems in this case it's just running into a bug with mono. With Mono v3.12.2, the following code reliably crashes in a few minutes with n=10, or a few seconds with n=100. While I didn't leave it running for hours, It never seems to crash running on Windows or with Mono v4.4.0, so they must have changed something in future versions which fixed it.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Craig Wilson [ 07/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks Andrew, That is really weird. I wonder if Mono is doing something different than .NET. I've never seen this behavior before and it seems to run counter to the expectations of the method. We'll look into it. Craig | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrew Stewart [ 07/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've also run into this or a similar issue. We're running Mono v3.12.2 on CentOS 6.5 using v2.2.4 of the c# driver. Our application makes around a dozen concurrent connections to the MongoDB, inserts asynchronously a number of documents into a different collection on each connection, and then waits for all of the inserts to complete before moving onto the next batch of data. After a number of hours, we see the application crashing with an error "MongoDB.Driver.MongoWaitQueueFullException: The wait queue for acquiring a connection to server localhost:27017 is full." which doesn't really make sense as the wait queue size is 1250, and we never have 1250 concurrent connections open. I instrumented the driver code to log more information on crashes to try to determine what the problem might be. I changed the CheckingOutConnection method in the AcquireConnectionHelper class in ExclusiveConnectionPool.cs to the following - just adding a line to log the wait queue CurrentCount:
and replaced the dll with the newly built one. And indeed, after several hours of running, the application crashed again. However, what it logged was
So the problem isn't that the queue is full - it's that calling _pool._waitQueue.Wait(0) can return false even if there is still room in the wait queue available. I'm not immediately sure of a good way to distinguish between the wait queue actually being full and Wait(0) returning false because it just took too long, but the current error message is misleading at best. Let me know if there's any other debugging information I could provide about this. Andrew | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Craig Wilson [ 04/Apr/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks so much, Shai. In that timeout exception, there is a HeartbeatException message: System.Runtime.InteropServices.SEHException: External component has thrown an exception. This occurs outside the driver and, because of the nature of this exception, we don't actually know what is going on. So, can you enable network tracing? Specifically, we'll care about System.Net and System.Net.Sockets. If you could provide the traces covering a period where these errors show up, that show prove to be immensely helpful. Craig | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by shai bar [ 04/Apr/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Craig Anyway i changed my code to be like this CreateClient is called in the constructor
And for your questions 1) The timeout exception is this:
2) when i trying to querying the DB with single request its usually succeed | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Craig Wilson [ 04/Apr/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The value is per mongo server. Since you only have 1 server listed for the client, then you should see at most 101 (1 for a dedicated heartbeat connection) connections to that server from your client. Your method name (GetCollectionConnection) is a bit of a misnomer as the collection doesn't contain a connection at all. Rather, it is attached to a pool of connections that the MongoClient is holding onto. I'd suggest you create a single instance variable of the MongoClient somewhere and always use that. The other exception message you have is semi-related. If you start hitting your mongo server with many requests and the client is unable to talk to the server, then you could get a quick escalation to blowing out the wait queue. Eventually, you would start to see TimeoutExceptions as you are showing above. In fact, the TimeoutException is more interesting because it provides us much more information, so: 1) Could you provide the full message of the TimeoutException? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by shai bar [ 04/Apr/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Craig the connection string is
i have a question }." but we are working on a single server ... Why the mongo server/ Client think that we have a cluster? Thanx Shai | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Craig Wilson [ 04/Apr/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Shai, Sorry you are having some trouble. There is a max connection pool size of 100 connections(by default) and some multiple of that for the wait queue. Essentially, you have opened so many tasks that haven't yet completeld that you are blowing over that multiple. To help us figure this out, I have a couple of questions: 1) You indicate that you are having issues when talking with linux in the title. Do you not have issues when using windows? Thanks, |