[CSHARP-331] System.IO.IOException: Unable to read data from the transport connection Created: 27/Sep/11  Updated: 02/Apr/15  Resolved: 14/Oct/11

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: 1.2
Fix Version/s: 1.3

Type: Bug Priority: Major - P3
Reporter: Huy Nguyen Assignee: Robert Stam
Resolution: Done Votes: 0
Labels: concurrency, connection, driver, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Mongod 2.0 on windows 2008 (single node). C#.NET 3.5 driver 1.1/1.2 on windows 2008 service


Attachments: File App.config     Text File CONST.cs     Text File CrawlRequestDB.cs     Zip Archive Debug & Error Driver v1.2.zip     Zip Archive Error.zip     Text File FlushableStore.cs     Text File MongoAccess.cs     Zip Archive mongo.zip    
Backwards Compatibility: Major Change

 Description   

We multiple web crawlers, dumping data into single node of mongod running on windows 2008. All crawlers are .net 3.5 on windows 2008 machines as well. I routinely get this error:

System.IO.IOException: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

Initially when the db size was smaller, it happened less frequently, but now was the size of the db grows (currently only at ~13gb), it is happening at a rate of once every 3-4 seconds and it slows everything down. This happens even if I turn off ALL crawlers, and only leave 1 instance running.

I should note that the crawler is multithreaded. The thread pool on the crawler is usually kept between 32-64 and alot of data are flushed into mongod in batches of 200-500 items at a time, every 10-30 seconds (depending on collections).

Attached are the full mongod log file, along with 1 of the service error logs (so you can see the full errors). I've also attached the mongo connection wrapper that we are using and a few cs file of interest so you can see how data are dumped into mongo.

We also notice when used with driver v1.1, the error is a little less frequent than in v1.2. For the driver, we are setting the time out at 10s for requests.



 Comments   
Comment by Robert Stam [ 14/Oct/11 ]

Glad to hear that everything is working great now! I will go ahead and close this ticket now. Feel free to follow up as needed.

Comment by Huy Nguyen [ 14/Oct/11 ]

Robert. Sorry for the late reply. I've been swamp. I was able to go back to my code and figure out the issues base on your comments & suggestions. You were right, I had server.Connect(), ReConnect(), & Disconnect() calls in places. Taking these out completely eliminated the high connection count.

Going back to the slow query log, I was able to identify the queries that required indexes. After doing these 2 things, we got stellar performance from Mongo 2.0 & the new 1.2 c# driver, which is great news. We will scale this project up on Mongo & switch to the new driver on our other production systems.

Thank you so much for your help.

Comment by Robert Stam [ 07/Oct/11 ]

Any updates on this? I tried looking at your code but got lost in the multiple levels of nested lambdas (the lambdas also make the stack traces have weird names in them).

The server log shows lots of operations that are taking multiple seconds. If you are really setting a 10 second timeout (I couldn't find where you were doing that) you may just be getting lots of timeouts because your timeout is too low.

The other thing that looked odd in the server log was that the connection numbers were really high, which seems to indicate that the connections are being closed and reopened a lot. Are you calling Disconnect or Reconnect anywhere?

Let me know if you have any more information.

Comment by Robert Stam [ 29/Sep/11 ]

These exceptions are thrown by the C# driver when the server is not responding within the timeout period (default 30 seconds). So it's not really correct to classify this as a C# driver bug. Although if you routinely expect operations to take a long time you can configure a longer timeout period. But what you really need to do is find out why the server operations are slow. Often it's just a matter of adding the right index.

Comment by Huy Nguyen [ 28/Sep/11 ]

I also notice that most (if not all) of the error thrown are caused by FindAndModify or FindAndRemove methods. Most of everything else are flushed to the DB using un-safe writes so there are no errors for those. Even for the batch inserts or upserts with safemodes, they are responding ok.

I should note that FindAndModify or FindAndRemove methods sometimes takes upward of 400 seconds (as logged on the mongod server). However, it is not showing up very often when comparing to the amount of error being thrown.

Comment by Huy Nguyen [ 28/Sep/11 ]

Attached (Debug & Error Driver v1.2.zip) is a sample run log with version 1.2 driver since this version throws slightly different error than version 1.1.

in 1.1, it throws typically IOException, in v1.2, it mainly throws TimeoutExceptions.

In the zip is both the full debug log & the error only log.

Generated at Wed Feb 07 21:36:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.