[CSHARP-1435] System.TimeoutException: A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Created: 05/Oct/15  Updated: 13/Apr/16  Resolved: 04/Apr/16

Status: Closed
Project: C# Driver
Component/s: API
Affects Version/s: 2.0.1
Fix Version/s: None

Type: Bug Priority: Blocker - P1
Reporter: Luca casali Assignee: Craig Wilson
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Mongodb 2.6.11 - Cluster with one replicaset - Connection to shared routers


Attachments: Zip Archive Issue 2015-10-05.zip     Text File mongos_mywapp1.log     Text File mongos_mywapp2.log    
Issue Links:
Duplicate
duplicates CSHARP-1515 Queries using the Legacy API can leak... Closed

 Description   

In heavy CPU loading condition our Web Application Get Stucked.

All threads are blocked and after 30 second in the log are added those errors:

[ERROR] - 20151005 13:15:02:97615 - RequestId=1a863907-853e-41f7-8ea7-7d84ee81db83|RequestId=1a863907-853e-41f7-8ea7-7d84ee81db83|Error on EntityController.OnException
System.TimeoutException: A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = WritableServerSelector, LatencyLimitingServerSelector

{ AllowedLatencyRange = 00:00:00.0150000 }

}. Client view of cluster state is { ClusterId : "1", Type : "Sharded", State : "Disconnected", Servers : [{ ServerId: "

{ ClusterId : 1, EndPoint : "Unspecified/mywapp1:27017" }

", EndPoint: "Unspecified/mywapp1:27017", State: "Disconnected", Type: "Unknown" }, { ServerId: "

{ ClusterId : 1, EndPoint : "Unspecified/mywapp2:27017" }

", EndPoint: "Unspecified/mywapp2:27017", State: "Disconnected", Type: "Unknown" }] }.

As you can see all nodes are disconnected.

This is our connection string:

mongodb://dbuser:XXXXXXXXX@mywapp1,mywapp2/?connect=ShardRouter&maxPoolSize=500&minPoolSize=50



 Comments   
Comment by John Hart [ 30/Jan/16 ]

I was actually able to solve this issue by upgrading the driver from source the same day CSHARP-1515 was resolved. I was able to reproduce the problem within minutes w/ the previous version but I haven't seen the issue since that was fixed.

Comment by Poltavets Andrey [ 29/Jan/16 ]

Connection string:
mongodb://mongo.bitpool.com:27017

rs.config() throw error:

2016-01-29T10:31:21.231+0600 E QUERY    [thread1] Error: error: { "$err" : "can't use 'local' database through mongos", "code" : 20 } :
_getErrorWithCode@src/mongo/shell/utils.js:23:13
DBCommandCursor@src/mongo/shell/query.js:679:1
DBQuery.prototype._exec@src/mongo/shell/query.js:105:28
DBQuery.prototype.hasNext@src/mongo/shell/query.js:267:5
DBCollection.prototype.findOne@src/mongo/shell/collection.js:215:12
rs.conf@src/mongo/shell/utils.js:1090:16
@(shell):1:1

db.serverStatus() result:

{
	"host" : "bit-pool-cluster-3",
	"advisoryHostFQDNs" : [
		"bit-pool-cluster-3.bitpool.3137.mongodbdns.com"
	],
	"version" : "3.2.1",
	"process" : "mongos",
	"pid" : NumberLong(27918),
	"uptime" : 919274,
	"uptimeMillis" : NumberLong(919274741),
	"uptimeEstimate" : 777110,
	"localTime" : ISODate("2016-01-29T05:02:35.758Z"),
	"asserts" : {
		"regular" : 0,
		"warning" : 0,
		"msg" : 0,
		"user" : 7,
		"rollovers" : 0
	},
	"connections" : {
		"current" : 266,
		"available" : 50934,
		"totalCreated" : NumberLong(217250)
	},
	"extra_info" : {
		"note" : "fields vary by platform",
		"heap_usage_bytes" : 3934760,
		"page_faults" : 138
	},
	"network" : {
		"bytesIn" : NumberLong(2046862258),
		"bytesOut" : NumberLong("11800519939"),
		"numRequests" : NumberLong(21433498)
	},
	"opcounters" : {
		"insert" : 530103,
		"query" : 3156927,
		"update" : 869696,
		"delete" : 1352,
		"getmore" : 11992,
		"command" : 18052996
	},
	"sharding" : {
		"configsvrConnectionString" : "bitpool-cluster-config-1.bitpool.3137.mongodbdns.com:27020,bitpool-cluster-config-2.bitpool.3137.mongodbdns.com:27020,bitpool-cluster-config-3.bitpool.3137.mongodbdns.com:27025"
	},
	"tcmalloc" : {
		"generic" : {
			"current_allocated_bytes" : 3934760,
			"heap_size" : 144113664
		},
		"tcmalloc" : {
			"pageheap_free_bytes" : 102989824,
			"pageheap_unmapped_bytes" : 98304,
			"max_total_thread_cache_bytes" : NumberLong(1073741824),
			"current_total_thread_cache_bytes" : 18002440,
			"central_cache_free_bytes" : 6011216,
			"transfer_cache_free_bytes" : 13077120,
			"thread_cache_free_bytes" : 18002440,
			"aggressive_memory_decommit" : 0
		},
		"formattedString" : "------------------------------------------------\nMALLOC:        3934760 (    3.8 MiB) Bytes in use by application\nMALLOC: +    102989824 (   98.2 MiB) Bytes in page heap freelist\nMALLOC: +      6011216 (    5.7 MiB) Bytes in central cache freelist\nMALLOC: +     13077120 (   12.5 MiB) Bytes in transfer cache freelist\nMALLOC: +     18002440 (   17.2 MiB) Bytes in thread cache freelists\nMALLOC: +      2584736 (    2.5 MiB) Bytes in malloc metadata\nMALLOC:   ------------\nMALLOC: =    146600096 (  139.8 MiB) Actual memory used (physical + swap)\nMALLOC: +        98304 (    0.1 MiB) Bytes released to OS (aka unmapped)\nMALLOC:   ------------\nMALLOC: =    146698400 (  139.9 MiB) Virtual address space used\nMALLOC:\nMALLOC:           2313              Spans in use\nMALLOC:            286              Thread heaps in use\nMALLOC:           8192              Tcmalloc page size\n------------------------------------------------\nCall ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).\nBytes released to the OS take up virtual address space but no physical memory.\n"
	},
	"mem" : {
		"bits" : 64,
		"resident" : 131,
		"virtual" : 608,
		"supported" : true
	},
	"metrics" : {
		"commands" : {
			"<UNKNOWN>" : NumberLong(5),
			"_isSelf" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(16674)
			},
			"aggregate" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(193351)
			},
			"buildInfo" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(2975968)
			},
			"connPoolStats" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(16674)
			},
			"count" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(60289)
			},
			"createIndexes" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(18806)
			},
			"delete" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(1352)
			},
			"find" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(36971)
			},
			"getCmdLineOpts" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(1836121)
			},
			"getLastError" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(1176001)
			},
			"getLog" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(16675)
			},
			"getMore" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(2120)
			},
			"getParameter" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(2856482)
			},
			"getnonce" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(191461)
			},
			"hostInfo" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(16674)
			},
			"insert" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(365766)
			},
			"isMaster" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(2659214)
			},
			"listCollections" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(82)
			},
			"listCommands" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(2)
			},
			"listDatabases" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(13)
			},
			"listIndexes" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(1)
			},
			"logRotate" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(10)
			},
			"netstat" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(16674)
			},
			"ping" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(4457409)
			},
			"replSetGetStatus" : {
				"failed" : NumberLong(86432),
				"total" : NumberLong(86432)
			},
			"serverStatus" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(221162)
			},
			"update" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(869696)
			},
			"whatsmyuri" : {
				"failed" : NumberLong(0),
				"total" : NumberLong(7)
			}
		},
		"cursor" : {
			"open" : {
				"multiTarget" : NumberLong(0),
				"singleTarget" : NumberLong(0),
				"pinned" : NumberLong(0),
				"total" : NumberLong(0)
			}
		},
		"getLastError" : {
			"wtime" : {
				"num" : 0,
				"totalMillis" : 0
			}
		}
	},
	"ok" : 1
}

Config from our admin:

{ 
   "_id" : "bp-shard_0", 
   "version" : NumberInt(13), 
   "members" : [
       {
           "_id" : NumberInt(2), 
           "host" : "bit-pool-cluster-1.bitpool.3137.mongodbdns.com:27010", 
           "arbiterOnly" : false, 
           "buildIndexes" : true, 
           "hidden" : false, 
           "priority" : NumberInt(2), 
           "tags" : {
 
           }, 
           "slaveDelay" : NumberLong(0), 
           "votes" : NumberInt(1)
       }, 
       {
           "_id" : NumberInt(4), 
           "host" : "bit-pool-cluster-3.bitpool.3137.mongodbdns.com:27010", 
           "arbiterOnly" : false, 
           "buildIndexes" : true, 
           "hidden" : false, 
           "priority" : NumberInt(1), 
           "tags" : {
 
           }, 
           "slaveDelay" : NumberLong(0), 
           "votes" : NumberInt(1)
       }, 
       {
           "_id" : NumberInt(5), 
           "host" : "smooth.bitpool.com:27010", 
           "arbiterOnly" : true, 
           "buildIndexes" : true, 
           "hidden" : false, 
           "priority" : NumberInt(0), 
           "tags" : {
 
           }, 
           "slaveDelay" : NumberLong(0), 
           "votes" : NumberInt(1)
       }
   ], 
   "settings" : {
       "chainingAllowed" : true, 
       "heartbeatIntervalMillis" : NumberInt(2000), 
       "heartbeatTimeoutSecs" : NumberInt(10), 
       "electionTimeoutMillis" : NumberInt(10000), 
       "getLastErrorModes" : {
 
       }, 
       "getLastErrorDefaults" : {
           "w" : NumberInt(1), 
           "wtimeout" : NumberInt(0)
       }
   }
}

Comment by Craig Wilson [ 28/Jan/16 ]

So, in the first couple of lines, the message in the exception includes the client's view of the cluster.

{ ClusterId : "3", 
  ConnectionMode : "Automatic", 
  Type : "Unknown", 
  State : "Disconnected", 
  Servers : [
    { ServerId: "{ ClusterId : 3, EndPoint : "Unspecified/mongo.bitpool.com:27017" }", 
      EndPoint: "Unspecified/mongo.bitpool.com:27017", 
      State: "Disconnected", 
      Type: "Unknown", 
      HeartbeatException: "MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.1.5.184:27017

The message is cut off, presumably because it is too long and so I'll make an assumption that the other 2 servers are like the one we have. We have a cluster that is completely disconnected, and the server a mongo.bitpool.com is disconnected because of a SocketException:

"System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.1.5.184:27017".

So, down at the networking level (below the driver and in the .NET framework), we cannot make a connection to mongo.bitpool.com, which is identified by 10.1.5.184:27017.

There are some configuration items that could cause this to happen, and a lot of other network related things. Could you provide:
1. your connection string
2. your replica set configuration. You can get this by connecting to one of the replica set members using the shell and running rs.config().

Craig

Comment by Poltavets Andrey [ 28/Jan/16 ]

Hi Craig
I'm using amazon elastic beanstalk for our web api. Platform: 64bit Windows Server Core 2012 R2 v1.1.0 running IIS 8.5.
.net framework: 4.5.

Mongo cluster managed by cloud.mongodb.com. All instances hosted on amazon(linux based instances, not sure about distributive). 3 instances: 2 regular members and 1 arbiter. Mongo version 3.2.1, Storage Engine wiredTiger.

I'm using new async api.
Stack trace:

System.AggregateException
One or more errors occurred.
Inner Exception:
System.TimeoutException
A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = ReadPreferenceServerSelector{ ReadPreference = { Mode = Primary, TagSets = [] } }, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 } }. Client view of cluster state is { ClusterId : "3", ConnectionMode : "Automatic", Type : "Unknown", State : "Disconnected", Servers : [{ ServerId: "{ ClusterId : 3, EndPoint : "Unspecified/mongo.bitpool.com:27017" }", EndPoint: "Unspecified/mongo.bitpool.com:27017", State: "Disconnected", Type: "Unknown", HeartbeatException: "MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.1.5.184:27017
   at System.Net.Sockets.Socket.EndConnect(IAsyncResult asyncResult)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at MongoDB.Driver.Core.Connections.TcpStreamFactory.<ConnectAsync>d__7.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at MongoDB.Driver.Core.Connections.TcpStreamFactory.<CreateStreamAsync>d__4.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at MongoDB.Driver.Core.Connections.BinaryConnection.<OpenHelperAsync>d__47.MoveNext()
   --- End of inner exception stack trace ---
   at MongoDB.Driver.Core.Connections.BinaryConnection.<OpenHelperAsync>d__47.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at MongoDB.Driver.Core.Servers.ClusterableServer.<HeartbeatAsync>d__42.MoveNext()" }] }.
Stack Trace:
   at MongoDB.Driver.Core.Clusters.Cluster.ThrowTimeoutException(IServerSelector selector, ClusterDescription description)
   at MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedHelper.HandleCompletedTask(Task completedTask)
   at MongoDB.Driver.Core.Clusters.Cluster.<WaitForDescriptionChangedAsync>d__44.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at MongoDB.Driver.Core.Clusters.Cluster.<SelectServerAsync>d__37.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at MongoDB.Driver.Core.Bindings.ReadPreferenceBinding.<GetReadChannelSourceAsync>d__8.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at MongoDB.Driver.Core.Operations.FindOperation`1.<ExecuteAsync>d__107.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at MongoDB.Driver.OperationExecutor.<ExecuteReadOperationAsync>d__1`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at MongoDB.Driver.MongoCollectionImpl`1.<ExecuteReadOperationAsync>d__59`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at MongoDB.Driver.IAsyncCursorSourceExtensions.<ToListAsync>d__16`1.MoveNext()
Source: MongoDB.Driver.Core
Custom data:
Stack Trace:
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at BitPool.Mongo.MongoRepository.Get(FilterDefinition`1 query)
   at BitPool.Server.OS.Repositories.WidgetRepository.GetWidgets(IEnumerable`1 guids)
   at BitPool.Server.REST.WebApi.Controllers.V1.BitPoolOS_V1Controller.GetWidgets(List`1 ids)
   at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ActionExecutor.<>c__DisplayClass10.<GetExecutor>b__9(Object instance, Object[] methodParameters)
   at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ExecuteAsync(HttpControllerContext controllerContext, IDictionary`2 arguments, CancellationToken cancellationToken)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Controllers.ApiControllerActionInvoker.<InvokeActionAsyncCore>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__5.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__5.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<ExecuteActionFilterAsyncCore>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Controllers.ActionFilterResult.<ExecuteAsync>d__2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.AuthorizationFilterAttribute.<ExecuteAuthorizationFilterAsyncCore>d__2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Controllers.ExceptionFilterResult.<ExecuteAsync>d__0.MoveNext()
Source: mscorlib
Custom data:

Comment by Craig Wilson [ 27/Jan/16 ]

eternaldetroit@gmail.com,

Were you able to isolate the issue to driver or .NET framework (or something else). The exception you provided seems to indicate a problem outside the code-base. I've personally never seen this exception before.

Craig

Comment by Craig Wilson [ 27/Jan/16 ]

Hi Poltavets,

Information about your environment is extremely helpful, as John Hart provided above. Also, a stack trace of the exception would be great. Are you using the legacy API or the new API? Are you using async or sync?

Craig

Comment by Poltavets Andrey [ 26/Jan/16 ]

Hi Everyone.
I have same issue after update from 2.0.1 to 2.2.2.
I want to help fix this issue. How I can collect additional info for you?

Comment by John Hart [ 29/Nov/15 ]

I have been struggling with this issue with my production website for over a week now as well. We are an Azure-based website talking to Azure-based Mongolab cluster serving several million requests per day. Right now it looks like the problem pops up every 12 hours or so for us and then I have to trick Azure into switching VMs to clear things up. We are still using the legacy method of connecting and building queries, as we've not had the time to make the jump to the async methods yet.

Exception information:
Exception type: HttpException
Exception message: A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = WritableServerSelector, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 } }. Client view of cluster state is { ClusterId : "1", ConnectionMode : "Automatic", Type : "Unknown", State : "Disconnected", Servers : [{ ServerId: "{ ClusterId : 1, EndPoint : "Unspecified/hostname" }", EndPoint: "Unspecified/hostname", State: "Disconnected", Type: "Unknown", HeartbeatException: "MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server. ---> System.Net.Sockets.SocketException: An attempt was made to access a socket in a way forbidden by its access permissions at System.Net.Sockets.Socket.EndConnect(IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)

I found this discussion about running out of ephemeral ports and it seems to ring true for us – once the problem occurs every other outbound call starts failing (3rd party APIs, Azure-backed Redis, Azure SQL). This has made tracking things down even harder since it points to all sorts of other services as being the issue. Only recently did we narrow things down to Mongo. More info on Azure port starving here: http://www.freekpaans.nl/2015/08/starving-outgoing-connections-on-windows-azure-web-sites/

I just switched back to the 2.0.1 driver as Carl Tremblay did and I will report back if the problem goes away for us. Like Thomas Lorimor we also upgraded to 4.6 recently – if the problem does show back up tomorrow I will downgrade the framework and try again.

Comment by Thomas Lorimor [ 09/Nov/15 ]

FWIW we have been hitting this on the 2.0.0.828 legacy driver. Interestingly, it appears this started around the time we updated to .Net 4.6. Around the same time we started hitting the above exception, we started hitting another seemingly unrelated issue - we were spending large amounts on time in the GC while running large PLINQ queries. Some googling turned these links up:
http://stackoverflow.com/questions/31747992/garbage-collection-and-parallel-foreach-issue-after-vs2015-upgrade
http://blogs.msdn.com/b/maoni/archive/2015/08/12/gen2-free-list-changes-in-clr-4-6-gc.aspx

For us, we hit the above exception while in a PLINQ query that was querying MongoDB simultaneously on multiple threads.

Today we installed the hotfix for the GC bug (https://support.microsoft.com/en-us/kb/3088957) which has solved our PLINQ/GC issue. I will update back in a few days as to whether this caused this exception to go away for us.

Comment by Carl Tremblay [ 06/Nov/15 ]

Sample callstack (they kind of all look the same except on different collection operations)

An exception occurred while receiving a message from the server.
at MongoDB.Driver.Core.Connections.BinaryConnection.<ReceiveBufferAsync>d__45.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at MongoDB.Driver.Core.Connections.BinaryConnection.<ReceiveBufferAsync>d__46.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at MongoDB.Driver.Core.Connections.BinaryConnection.<ReceiveMessageAsync>d__47.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at MongoDB.Driver.Core.WireProtocol.QueryWireProtocol`1.<ExecuteAsync>d__16.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at MongoDB.Driver.Core.Servers.ClusterableServer.ServerChannel.<ExecuteProtocolAsync>d__15`1.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at MongoDB.Driver.Core.Operations.FindOperation`1.<ExecuteAsync>d__64.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at MongoDB.Driver.Sync.IOperationExtensions.Execute[TResult](IReadOperation`1 operation, IReadBinding binding, CancellationToken cancellationToken)
at MongoDB.Driver.MongoCollection.ExecuteReadOperation[TResult](IReadOperation`1 operation, ReadPreference readPreference)
at MongoDB.Driver.MongoCollection.FindOneAs[TDocument](FindOneArgs args)

Comment by Craig Wilson [ 06/Nov/15 ]

Thanks Carl. That is incredibly helpful to know that it might be a change in 2.1.0. We'll focus there.

Comment by Carl Tremblay [ 06/Nov/15 ]

Hey guys,

We've been getting this issue non-stop (every 5-10 minutes) for 2 days after switching from 2.0.1 legacy to 2.1.0 legacy. We are deployed on Azure. Let me know if I can help. Since then, we switched back to 2.0.1 and the issue is gone.

Comment by Craig Wilson [ 05/Nov/15 ]

Thanks Thomas,

If you guys end up with a reproducible situation, let us know. We are tracking something down right now that looks somewhat similar as well that is now surfacing inside some of our integration tests.

Craig

Comment by Thomas Lorimor [ 05/Nov/15 ]

Hi Craig, here at Pathmatics (formerly Adomic (formerly YieldMetrics)) we just started hitting this in just the last few nights. It's occurred several times though we can't reproduce this consistently. Below is the exception we're hittting while hitting our db on multiple parallel tasks.

This is using the 2.0.0.828 driver talking to a 2.6.4 cluster hosted by MongoLab.

We're actively troubleshooting this issue.

-Tom

---> (Inner Exception #4) System.TimeoutException: A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = WritableServerSelector, LatencyLimitingServerSelector

{ AllowedLatencyRange = 00:00:00.0150000 }

}. Client view of cluster state is { ClusterId : "2", Type : "ReplicaSet", State : "Disconnected", Servers : [{ ServerId: "

{ ClusterId : 2, EndPoint : "Unspecified/XXXXXXXXX-a4.ffs54.fleet.mongolab.com:29017" }

", EndPoint: "Unspecified/XXXXXXXXXX-a4.ffs54.fleet.mongolab.com:29017", State: "Disconnected", Type: "Unknown" }, { ServerId: "

{ ClusterId : 2, EndPoint : "Unspecified/XXXXXXXXX-a5x.ffs54.fleet.mongolab.com:29017" }

", EndPoint: "Unspecified/XXXXXXXXX-a5x.ffs54.fleet.mongolab.com:29017", State: "Disconnected", Type: "Unknown" }] }.
at MongoDB.Driver.Core.Clusters.Cluster.ThrowTimeoutException(IServerSelector selector, ClusterDescription description)
at MongoDB.Driver.Core.Clusters.Cluster.<WaitForDescriptionChangedAsync>d__18.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at MongoDB.Driver.Core.Clusters.Cluster.<SelectServerAsync>d__6.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at MongoDB.Driver.Core.Bindings.WritableServerBinding.<GetChannelSourceAsync>d__0.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at MongoDB.Driver.Core.Operations.FindOperation`1.<ExecuteAsync>d__2.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at MongoDB.Driver.Sync.IOperationExtensions.Execute[TResult](IReadOperation`1 operation, IReadBinding binding, CancellationToken cancellationToken)
at MongoDB.Driver.MongoCursor`1.GetEnumerator()
at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
at RadarDataLayer.MongoSnapshotRepository.GetFirstSnapshotUrls(IEnumerable`1 creativeHashesEnumerable) in x:\Radar\RadarDataLayer\RadarDataLayer\MongoSnapshotRepository.cs:line 149
at System.Linq.Parallel.SelectQueryOperator`2.SelectQueryOperatorEnumerator`1.MoveNext(TOutput& currentElement, TKey& currentKey)
at System.Linq.Parallel.SelectManyQueryOperator`3.SelectManyQueryOperatorEnumerator`1.MoveNext(TOutput& currentElement, Pair`2& currentKey)
at System.Linq.Parallel.StopAndGoSpoolingTask`2.SpoolingWork()
at System.Linq.Parallel.SpoolingTaskBase.Work()
at System.Linq.Parallel.QueryTask.BaseWork(Object unused)
at System.Threading.Tasks.Task.Execute()<---

Comment by Craig Wilson [ 28/Oct/15 ]

Marc-Andre I have taken your sample code, changed it a little to get it to compile, and have been unsuccessful in reproducing the issue.

I must comment on the code you've provided though. First, it's inadvisable to use async void methods. With the above code as is, it runs up to somewhere between 65 and 75 connections needed. However, when returning Tasks, it only uses 51 connections. 50 connections is exactly what I would expect from code attempting to parallelize with 50 calls. The extra connection is the internal monitoring one, so, using Tasks, we get exactly the parallelism we were going for.

If you need greater parallelism than 500, then you should increase the waitQueueSize and potentially even the max number of connections.

Comment by Craig Wilson [ 23/Oct/15 ]

Thanks Marc-Andre, I'll start playing with this and hopefully will get the same results and let you know how it goes.

Craig

Comment by Marc-Andre Ruel [X] [ 23/Oct/15 ]

I have reproduced the bug with the fallowing code.

async public void Test_MultipleWrites()
        {
            var mongoClient = ClientSingleton.getInstance(_connectionString);
            for (var i = 0; i < 1000; i++)
            {
                var col = mongoClient.GetDatabase("_Tests").GetCollection<TDocument>("tests");
                await col.InsertOneAsync(new TDocument { Id = Guid.NewGuid().ToString(), Name = Guid.NewGuid().ToString() });
 
                Thread.Sleep(1);
            }
        }
 
        public void Test_MultipleWritesAndReadConcurrent()
        {
            var mongoClient = ClientSingleton.getInstance(_connectionString);
            Parallel.For(0, 50, (i, j) =>
            {
                Test_MultipleWrites();
                var resultQuery = mongoClient.GetDatabase("_Tests").GetCollection<TDocument>("tests").AsQueryable<TDocument>().Where(p => p.Name.Contains("ec")).ToList();
            });
        }
 
        async public void Test_MultipleWritesConcurrent()
        {
            var mongoClient = ClientSingleton.getInstance(_connectionString);
            Parallel.For(0, 50, (i, j) => { Test_MultipleWrites(); });
            await mongoClient.GetDatabase("_Tests").DropCollectionAsync("tests");
        }

I have simply started a replica set on my local machine on 3 different ports with the fallowing connection string: mongodb://localhost:27017/?replicaSet=rs;readPreference=primary

I used the c# driver from NuGet, version 2.1.0.145

Comment by Luca casali [ 08/Oct/15 ]

Hi Craig we are getting data from our Zabbix system. Network analysis and
mongo read write....
I'll get you back hope tomorrow!

Comment by Craig Wilson [ 08/Oct/15 ]

Hi Luca,

I'd love to have some more information on this. I have attempted to reproduce this problem without success. I have hammered a sharded cluster with load tests over an extended period of time and have been unable to force the driver to break. I'm not saying there isn't a problem, just that I haven't been able to reproduce it without further information.

Craig

Comment by Craig Wilson [ 07/Oct/15 ]

I just saw your "all threads are blocked" statement... Could you elaborate on what you mean? Perhaps some example code from one or your controller actions?

Comment by Craig Wilson [ 07/Oct/15 ]

Hi Luca,

1.10 is sort-of maintained for critical bug fixes. New features may or may not show up there.

I'm attempting reproduce your heavy-loading problem. So far, I haven't had any luck. Could you give me a profile of the type of load you are pushing on it? Number of inserts/updates/deletes/reads? How many per secondy, etc...?

Craig

Comment by Luca casali [ 05/Oct/15 ]

Hi Graig,

we asked to our Service Provider to check all connection and we have a Monitoring system (zabbix) that is not reporting any network issue during that period. So our IT exclude any network issue.

Can you also tell me if we are moving back to 1.10 is this version of driver maintained?

We currently can avoid async await that we are using only in small part of our SW.

Luca

Comment by Craig Wilson [ 05/Oct/15 ]

Something certainly happened at 2015-10-05T10:16:06.273+0100. A little later, it seems like the problem is resolved, and then immediately goes back into decline again. This seems to happen over and over again. It seems like it was resolved around 2015-10-05T14:19:32.372+0100.

Can you talk with your network guys and see if anything popped up with them? Attempting to get this to reproduce is going to be difficult, and the logs don't show enough to help me in a repro. It might be a bug in the driver. It also might not be, which makes this difficult to diagnose. I'll continue under the premise this is a bug until I've exhausted all options.

Comment by Luca casali [ 05/Oct/15 ]

We Started having problem beetween 10:15 and 14:00.

Now we swtiched back to 1.10.1 and it is working but of course we cannot use async await.

I've attached mongos logs.

We analyzed our logs and we can confirm that we haven't any network problem during that timeframe

Comment by Craig Wilson [ 05/Oct/15 ]

Hi Luca,

Can you upload your mongos logs from the time when this happened? Also, did you notice any networking issues during this time?

Craig

Comment by Luca casali [ 05/Oct/15 ]

Mongos are on window

Generated at Wed Feb 07 21:39:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.