|
I was actually able to solve this issue by upgrading the driver from source the same day CSHARP-1515 was resolved. I was able to reproduce the problem within minutes w/ the previous version but I haven't seen the issue since that was fixed.
|
|
Connection string:
mongodb://mongo.bitpool.com:27017
rs.config() throw error:
2016-01-29T10:31:21.231+0600 E QUERY [thread1] Error: error: { "$err" : "can't use 'local' database through mongos", "code" : 20 } :
|
_getErrorWithCode@src/mongo/shell/utils.js:23:13
|
DBCommandCursor@src/mongo/shell/query.js:679:1
|
DBQuery.prototype._exec@src/mongo/shell/query.js:105:28
|
DBQuery.prototype.hasNext@src/mongo/shell/query.js:267:5
|
DBCollection.prototype.findOne@src/mongo/shell/collection.js:215:12
|
rs.conf@src/mongo/shell/utils.js:1090:16
|
@(shell):1:1
|
db.serverStatus() result:
{
|
"host" : "bit-pool-cluster-3",
|
"advisoryHostFQDNs" : [
|
"bit-pool-cluster-3.bitpool.3137.mongodbdns.com"
|
],
|
"version" : "3.2.1",
|
"process" : "mongos",
|
"pid" : NumberLong(27918),
|
"uptime" : 919274,
|
"uptimeMillis" : NumberLong(919274741),
|
"uptimeEstimate" : 777110,
|
"localTime" : ISODate("2016-01-29T05:02:35.758Z"),
|
"asserts" : {
|
"regular" : 0,
|
"warning" : 0,
|
"msg" : 0,
|
"user" : 7,
|
"rollovers" : 0
|
},
|
"connections" : {
|
"current" : 266,
|
"available" : 50934,
|
"totalCreated" : NumberLong(217250)
|
},
|
"extra_info" : {
|
"note" : "fields vary by platform",
|
"heap_usage_bytes" : 3934760,
|
"page_faults" : 138
|
},
|
"network" : {
|
"bytesIn" : NumberLong(2046862258),
|
"bytesOut" : NumberLong("11800519939"),
|
"numRequests" : NumberLong(21433498)
|
},
|
"opcounters" : {
|
"insert" : 530103,
|
"query" : 3156927,
|
"update" : 869696,
|
"delete" : 1352,
|
"getmore" : 11992,
|
"command" : 18052996
|
},
|
"sharding" : {
|
"configsvrConnectionString" : "bitpool-cluster-config-1.bitpool.3137.mongodbdns.com:27020,bitpool-cluster-config-2.bitpool.3137.mongodbdns.com:27020,bitpool-cluster-config-3.bitpool.3137.mongodbdns.com:27025"
|
},
|
"tcmalloc" : {
|
"generic" : {
|
"current_allocated_bytes" : 3934760,
|
"heap_size" : 144113664
|
},
|
"tcmalloc" : {
|
"pageheap_free_bytes" : 102989824,
|
"pageheap_unmapped_bytes" : 98304,
|
"max_total_thread_cache_bytes" : NumberLong(1073741824),
|
"current_total_thread_cache_bytes" : 18002440,
|
"central_cache_free_bytes" : 6011216,
|
"transfer_cache_free_bytes" : 13077120,
|
"thread_cache_free_bytes" : 18002440,
|
"aggressive_memory_decommit" : 0
|
},
|
"formattedString" : "------------------------------------------------\nMALLOC: 3934760 ( 3.8 MiB) Bytes in use by application\nMALLOC: + 102989824 ( 98.2 MiB) Bytes in page heap freelist\nMALLOC: + 6011216 ( 5.7 MiB) Bytes in central cache freelist\nMALLOC: + 13077120 ( 12.5 MiB) Bytes in transfer cache freelist\nMALLOC: + 18002440 ( 17.2 MiB) Bytes in thread cache freelists\nMALLOC: + 2584736 ( 2.5 MiB) Bytes in malloc metadata\nMALLOC: ------------\nMALLOC: = 146600096 ( 139.8 MiB) Actual memory used (physical + swap)\nMALLOC: + 98304 ( 0.1 MiB) Bytes released to OS (aka unmapped)\nMALLOC: ------------\nMALLOC: = 146698400 ( 139.9 MiB) Virtual address space used\nMALLOC:\nMALLOC: 2313 Spans in use\nMALLOC: 286 Thread heaps in use\nMALLOC: 8192 Tcmalloc page size\n------------------------------------------------\nCall ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).\nBytes released to the OS take up virtual address space but no physical memory.\n"
|
},
|
"mem" : {
|
"bits" : 64,
|
"resident" : 131,
|
"virtual" : 608,
|
"supported" : true
|
},
|
"metrics" : {
|
"commands" : {
|
"<UNKNOWN>" : NumberLong(5),
|
"_isSelf" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(16674)
|
},
|
"aggregate" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(193351)
|
},
|
"buildInfo" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(2975968)
|
},
|
"connPoolStats" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(16674)
|
},
|
"count" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(60289)
|
},
|
"createIndexes" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(18806)
|
},
|
"delete" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1352)
|
},
|
"find" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(36971)
|
},
|
"getCmdLineOpts" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1836121)
|
},
|
"getLastError" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1176001)
|
},
|
"getLog" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(16675)
|
},
|
"getMore" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(2120)
|
},
|
"getParameter" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(2856482)
|
},
|
"getnonce" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(191461)
|
},
|
"hostInfo" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(16674)
|
},
|
"insert" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(365766)
|
},
|
"isMaster" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(2659214)
|
},
|
"listCollections" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(82)
|
},
|
"listCommands" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(2)
|
},
|
"listDatabases" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(13)
|
},
|
"listIndexes" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1)
|
},
|
"logRotate" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(10)
|
},
|
"netstat" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(16674)
|
},
|
"ping" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(4457409)
|
},
|
"replSetGetStatus" : {
|
"failed" : NumberLong(86432),
|
"total" : NumberLong(86432)
|
},
|
"serverStatus" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(221162)
|
},
|
"update" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(869696)
|
},
|
"whatsmyuri" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(7)
|
}
|
},
|
"cursor" : {
|
"open" : {
|
"multiTarget" : NumberLong(0),
|
"singleTarget" : NumberLong(0),
|
"pinned" : NumberLong(0),
|
"total" : NumberLong(0)
|
}
|
},
|
"getLastError" : {
|
"wtime" : {
|
"num" : 0,
|
"totalMillis" : 0
|
}
|
}
|
},
|
"ok" : 1
|
}
|
Config from our admin:
{
|
"_id" : "bp-shard_0",
|
"version" : NumberInt(13),
|
"members" : [
|
{
|
"_id" : NumberInt(2),
|
"host" : "bit-pool-cluster-1.bitpool.3137.mongodbdns.com:27010",
|
"arbiterOnly" : false,
|
"buildIndexes" : true,
|
"hidden" : false,
|
"priority" : NumberInt(2),
|
"tags" : {
|
|
},
|
"slaveDelay" : NumberLong(0),
|
"votes" : NumberInt(1)
|
},
|
{
|
"_id" : NumberInt(4),
|
"host" : "bit-pool-cluster-3.bitpool.3137.mongodbdns.com:27010",
|
"arbiterOnly" : false,
|
"buildIndexes" : true,
|
"hidden" : false,
|
"priority" : NumberInt(1),
|
"tags" : {
|
|
},
|
"slaveDelay" : NumberLong(0),
|
"votes" : NumberInt(1)
|
},
|
{
|
"_id" : NumberInt(5),
|
"host" : "smooth.bitpool.com:27010",
|
"arbiterOnly" : true,
|
"buildIndexes" : true,
|
"hidden" : false,
|
"priority" : NumberInt(0),
|
"tags" : {
|
|
},
|
"slaveDelay" : NumberLong(0),
|
"votes" : NumberInt(1)
|
}
|
],
|
"settings" : {
|
"chainingAllowed" : true,
|
"heartbeatIntervalMillis" : NumberInt(2000),
|
"heartbeatTimeoutSecs" : NumberInt(10),
|
"electionTimeoutMillis" : NumberInt(10000),
|
"getLastErrorModes" : {
|
|
},
|
"getLastErrorDefaults" : {
|
"w" : NumberInt(1),
|
"wtimeout" : NumberInt(0)
|
}
|
}
|
}
|
|
|
So, in the first couple of lines, the message in the exception includes the client's view of the cluster.
{ ClusterId : "3",
|
ConnectionMode : "Automatic",
|
Type : "Unknown",
|
State : "Disconnected",
|
Servers : [
|
{ ServerId: "{ ClusterId : 3, EndPoint : "Unspecified/mongo.bitpool.com:27017" }",
|
EndPoint: "Unspecified/mongo.bitpool.com:27017",
|
State: "Disconnected",
|
Type: "Unknown",
|
HeartbeatException: "MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.1.5.184:27017
|
The message is cut off, presumably because it is too long and so I'll make an assumption that the other 2 servers are like the one we have. We have a cluster that is completely disconnected, and the server a mongo.bitpool.com is disconnected because of a SocketException:
"System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.1.5.184:27017".
So, down at the networking level (below the driver and in the .NET framework), we cannot make a connection to mongo.bitpool.com, which is identified by 10.1.5.184:27017.
There are some configuration items that could cause this to happen, and a lot of other network related things. Could you provide:
1. your connection string
2. your replica set configuration. You can get this by connecting to one of the replica set members using the shell and running rs.config().
Craig
|
|
Hi Craig
I'm using amazon elastic beanstalk for our web api. Platform: 64bit Windows Server Core 2012 R2 v1.1.0 running IIS 8.5.
.net framework: 4.5.
Mongo cluster managed by cloud.mongodb.com. All instances hosted on amazon(linux based instances, not sure about distributive). 3 instances: 2 regular members and 1 arbiter. Mongo version 3.2.1, Storage Engine wiredTiger.
I'm using new async api.
Stack trace:
System.AggregateException
|
One or more errors occurred.
|
Inner Exception:
|
System.TimeoutException
|
A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = ReadPreferenceServerSelector{ ReadPreference = { Mode = Primary, TagSets = [] } }, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 } }. Client view of cluster state is { ClusterId : "3", ConnectionMode : "Automatic", Type : "Unknown", State : "Disconnected", Servers : [{ ServerId: "{ ClusterId : 3, EndPoint : "Unspecified/mongo.bitpool.com:27017" }", EndPoint: "Unspecified/mongo.bitpool.com:27017", State: "Disconnected", Type: "Unknown", HeartbeatException: "MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.1.5.184:27017
|
at System.Net.Sockets.Socket.EndConnect(IAsyncResult asyncResult)
|
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
|
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
|
at MongoDB.Driver.Core.Connections.TcpStreamFactory.<ConnectAsync>d__7.MoveNext()
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
|
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
|
at MongoDB.Driver.Core.Connections.TcpStreamFactory.<CreateStreamAsync>d__4.MoveNext()
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
|
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
|
at MongoDB.Driver.Core.Connections.BinaryConnection.<OpenHelperAsync>d__47.MoveNext()
|
--- End of inner exception stack trace ---
|
at MongoDB.Driver.Core.Connections.BinaryConnection.<OpenHelperAsync>d__47.MoveNext()
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
|
at MongoDB.Driver.Core.Servers.ClusterableServer.<HeartbeatAsync>d__42.MoveNext()" }] }.
|
Stack Trace:
|
at MongoDB.Driver.Core.Clusters.Cluster.ThrowTimeoutException(IServerSelector selector, ClusterDescription description)
|
at MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedHelper.HandleCompletedTask(Task completedTask)
|
at MongoDB.Driver.Core.Clusters.Cluster.<WaitForDescriptionChangedAsync>d__44.MoveNext()
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
|
at MongoDB.Driver.Core.Clusters.Cluster.<SelectServerAsync>d__37.MoveNext()
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
|
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
|
at MongoDB.Driver.Core.Bindings.ReadPreferenceBinding.<GetReadChannelSourceAsync>d__8.MoveNext()
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
|
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
|
at MongoDB.Driver.Core.Operations.FindOperation`1.<ExecuteAsync>d__107.MoveNext()
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
|
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
|
at MongoDB.Driver.OperationExecutor.<ExecuteReadOperationAsync>d__1`1.MoveNext()
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
|
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
|
at MongoDB.Driver.MongoCollectionImpl`1.<ExecuteReadOperationAsync>d__59`1.MoveNext()
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
|
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
|
at MongoDB.Driver.IAsyncCursorSourceExtensions.<ToListAsync>d__16`1.MoveNext()
|
Source: MongoDB.Driver.Core
|
Custom data:
|
Stack Trace:
|
at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
|
at BitPool.Mongo.MongoRepository.Get(FilterDefinition`1 query)
|
at BitPool.Server.OS.Repositories.WidgetRepository.GetWidgets(IEnumerable`1 guids)
|
at BitPool.Server.REST.WebApi.Controllers.V1.BitPoolOS_V1Controller.GetWidgets(List`1 ids)
|
at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ActionExecutor.<>c__DisplayClass10.<GetExecutor>b__9(Object instance, Object[] methodParameters)
|
at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ExecuteAsync(HttpControllerContext controllerContext, IDictionary`2 arguments, CancellationToken cancellationToken)
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
|
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
|
at System.Web.Http.Controllers.ApiControllerActionInvoker.<InvokeActionAsyncCore>d__0.MoveNext()
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
|
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
|
at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__5.MoveNext()
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__5.MoveNext()
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
|
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
|
at System.Web.Http.Filters.ActionFilterAttribute.<ExecuteActionFilterAsyncCore>d__0.MoveNext()
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
|
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
|
at System.Web.Http.Controllers.ActionFilterResult.<ExecuteAsync>d__2.MoveNext()
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
|
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
|
at System.Web.Http.Filters.AuthorizationFilterAttribute.<ExecuteAuthorizationFilterAsyncCore>d__2.MoveNext()
|
--- End of stack trace from previous location where exception was thrown ---
|
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
|
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
|
at System.Web.Http.Controllers.ExceptionFilterResult.<ExecuteAsync>d__0.MoveNext()
|
Source: mscorlib
|
Custom data:
|
|
|
eternaldetroit@gmail.com,
Were you able to isolate the issue to driver or .NET framework (or something else). The exception you provided seems to indicate a problem outside the code-base. I've personally never seen this exception before.
Craig
|
|
Hi Poltavets,
Information about your environment is extremely helpful, as John Hart provided above. Also, a stack trace of the exception would be great. Are you using the legacy API or the new API? Are you using async or sync?
Craig
|
|
Hi Everyone.
I have same issue after update from 2.0.1 to 2.2.2.
I want to help fix this issue. How I can collect additional info for you?
|
|
I have been struggling with this issue with my production website for over a week now as well. We are an Azure-based website talking to Azure-based Mongolab cluster serving several million requests per day. Right now it looks like the problem pops up every 12 hours or so for us and then I have to trick Azure into switching VMs to clear things up. We are still using the legacy method of connecting and building queries, as we've not had the time to make the jump to the async methods yet.
Exception information:
|
Exception type: HttpException
|
Exception message: A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = WritableServerSelector, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 } }. Client view of cluster state is { ClusterId : "1", ConnectionMode : "Automatic", Type : "Unknown", State : "Disconnected", Servers : [{ ServerId: "{ ClusterId : 1, EndPoint : "Unspecified/hostname" }", EndPoint: "Unspecified/hostname", State: "Disconnected", Type: "Unknown", HeartbeatException: "MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server. ---> System.Net.Sockets.SocketException: An attempt was made to access a socket in a way forbidden by its access permissions at System.Net.Sockets.Socket.EndConnect(IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
|
I found this discussion about running out of ephemeral ports and it seems to ring true for us – once the problem occurs every other outbound call starts failing (3rd party APIs, Azure-backed Redis, Azure SQL). This has made tracking things down even harder since it points to all sorts of other services as being the issue. Only recently did we narrow things down to Mongo. More info on Azure port starving here: http://www.freekpaans.nl/2015/08/starving-outgoing-connections-on-windows-azure-web-sites/
I just switched back to the 2.0.1 driver as Carl Tremblay did and I will report back if the problem goes away for us. Like Thomas Lorimor we also upgraded to 4.6 recently – if the problem does show back up tomorrow I will downgrade the framework and try again.
|
|
FWIW we have been hitting this on the 2.0.0.828 legacy driver. Interestingly, it appears this started around the time we updated to .Net 4.6. Around the same time we started hitting the above exception, we started hitting another seemingly unrelated issue - we were spending large amounts on time in the GC while running large PLINQ queries. Some googling turned these links up:
http://stackoverflow.com/questions/31747992/garbage-collection-and-parallel-foreach-issue-after-vs2015-upgrade
http://blogs.msdn.com/b/maoni/archive/2015/08/12/gen2-free-list-changes-in-clr-4-6-gc.aspx
For us, we hit the above exception while in a PLINQ query that was querying MongoDB simultaneously on multiple threads.
Today we installed the hotfix for the GC bug (https://support.microsoft.com/en-us/kb/3088957) which has solved our PLINQ/GC issue. I will update back in a few days as to whether this caused this exception to go away for us.
|
|
Sample callstack (they kind of all look the same except on different collection operations)
An exception occurred while receiving a message from the server.
at MongoDB.Driver.Core.Connections.BinaryConnection.<ReceiveBufferAsync>d__45.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at MongoDB.Driver.Core.Connections.BinaryConnection.<ReceiveBufferAsync>d__46.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at MongoDB.Driver.Core.Connections.BinaryConnection.<ReceiveMessageAsync>d__47.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at MongoDB.Driver.Core.WireProtocol.QueryWireProtocol`1.<ExecuteAsync>d__16.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at MongoDB.Driver.Core.Servers.ClusterableServer.ServerChannel.<ExecuteProtocolAsync>d__15`1.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at MongoDB.Driver.Core.Operations.FindOperation`1.<ExecuteAsync>d__64.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at MongoDB.Driver.Sync.IOperationExtensions.Execute[TResult](IReadOperation`1 operation, IReadBinding binding, CancellationToken cancellationToken)
at MongoDB.Driver.MongoCollection.ExecuteReadOperation[TResult](IReadOperation`1 operation, ReadPreference readPreference)
at MongoDB.Driver.MongoCollection.FindOneAs[TDocument](FindOneArgs args)
|
|
Thanks Carl. That is incredibly helpful to know that it might be a change in 2.1.0. We'll focus there.
|
|
Hey guys,
We've been getting this issue non-stop (every 5-10 minutes) for 2 days after switching from 2.0.1 legacy to 2.1.0 legacy. We are deployed on Azure. Let me know if I can help. Since then, we switched back to 2.0.1 and the issue is gone.
|
|
Thanks Thomas,
If you guys end up with a reproducible situation, let us know. We are tracking something down right now that looks somewhat similar as well that is now surfacing inside some of our integration tests.
Craig
|
|
Hi Craig, here at Pathmatics (formerly Adomic (formerly YieldMetrics)) we just started hitting this in just the last few nights. It's occurred several times though we can't reproduce this consistently. Below is the exception we're hittting while hitting our db on multiple parallel tasks.
This is using the 2.0.0.828 driver talking to a 2.6.4 cluster hosted by MongoLab.
We're actively troubleshooting this issue.
-Tom
---> (Inner Exception #4) System.TimeoutException: A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = WritableServerSelector, LatencyLimitingServerSelector
{ AllowedLatencyRange = 00:00:00.0150000 }
}. Client view of cluster state is { ClusterId : "2", Type : "ReplicaSet", State : "Disconnected", Servers : [{ ServerId: "
{ ClusterId : 2, EndPoint : "Unspecified/XXXXXXXXX-a4.ffs54.fleet.mongolab.com:29017" }
", EndPoint: "Unspecified/XXXXXXXXXX-a4.ffs54.fleet.mongolab.com:29017", State: "Disconnected", Type: "Unknown" }, { ServerId: "
{ ClusterId : 2, EndPoint : "Unspecified/XXXXXXXXX-a5x.ffs54.fleet.mongolab.com:29017" }
", EndPoint: "Unspecified/XXXXXXXXX-a5x.ffs54.fleet.mongolab.com:29017", State: "Disconnected", Type: "Unknown" }] }.
at MongoDB.Driver.Core.Clusters.Cluster.ThrowTimeoutException(IServerSelector selector, ClusterDescription description)
at MongoDB.Driver.Core.Clusters.Cluster.<WaitForDescriptionChangedAsync>d__18.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at MongoDB.Driver.Core.Clusters.Cluster.<SelectServerAsync>d__6.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at MongoDB.Driver.Core.Bindings.WritableServerBinding.<GetChannelSourceAsync>d__0.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at MongoDB.Driver.Core.Operations.FindOperation`1.<ExecuteAsync>d__2.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at MongoDB.Driver.Sync.IOperationExtensions.Execute[TResult](IReadOperation`1 operation, IReadBinding binding, CancellationToken cancellationToken)
at MongoDB.Driver.MongoCursor`1.GetEnumerator()
at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
at RadarDataLayer.MongoSnapshotRepository.GetFirstSnapshotUrls(IEnumerable`1 creativeHashesEnumerable) in x:\Radar\RadarDataLayer\RadarDataLayer\MongoSnapshotRepository.cs:line 149
at System.Linq.Parallel.SelectQueryOperator`2.SelectQueryOperatorEnumerator`1.MoveNext(TOutput& currentElement, TKey& currentKey)
at System.Linq.Parallel.SelectManyQueryOperator`3.SelectManyQueryOperatorEnumerator`1.MoveNext(TOutput& currentElement, Pair`2& currentKey)
at System.Linq.Parallel.StopAndGoSpoolingTask`2.SpoolingWork()
at System.Linq.Parallel.SpoolingTaskBase.Work()
at System.Linq.Parallel.QueryTask.BaseWork(Object unused)
at System.Threading.Tasks.Task.Execute()<---
|
|
Marc-Andre I have taken your sample code, changed it a little to get it to compile, and have been unsuccessful in reproducing the issue.
I must comment on the code you've provided though. First, it's inadvisable to use async void methods. With the above code as is, it runs up to somewhere between 65 and 75 connections needed. However, when returning Tasks, it only uses 51 connections. 50 connections is exactly what I would expect from code attempting to parallelize with 50 calls. The extra connection is the internal monitoring one, so, using Tasks, we get exactly the parallelism we were going for.
If you need greater parallelism than 500, then you should increase the waitQueueSize and potentially even the max number of connections.
|
|
Thanks Marc-Andre, I'll start playing with this and hopefully will get the same results and let you know how it goes.
Craig
|
|
I have reproduced the bug with the fallowing code.
async public void Test_MultipleWrites()
|
{
|
var mongoClient = ClientSingleton.getInstance(_connectionString);
|
for (var i = 0; i < 1000; i++)
|
{
|
var col = mongoClient.GetDatabase("_Tests").GetCollection<TDocument>("tests");
|
await col.InsertOneAsync(new TDocument { Id = Guid.NewGuid().ToString(), Name = Guid.NewGuid().ToString() });
|
|
Thread.Sleep(1);
|
}
|
}
|
|
public void Test_MultipleWritesAndReadConcurrent()
|
{
|
var mongoClient = ClientSingleton.getInstance(_connectionString);
|
Parallel.For(0, 50, (i, j) =>
|
{
|
Test_MultipleWrites();
|
var resultQuery = mongoClient.GetDatabase("_Tests").GetCollection<TDocument>("tests").AsQueryable<TDocument>().Where(p => p.Name.Contains("ec")).ToList();
|
});
|
}
|
|
async public void Test_MultipleWritesConcurrent()
|
{
|
var mongoClient = ClientSingleton.getInstance(_connectionString);
|
Parallel.For(0, 50, (i, j) => { Test_MultipleWrites(); });
|
await mongoClient.GetDatabase("_Tests").DropCollectionAsync("tests");
|
}
|
I have simply started a replica set on my local machine on 3 different ports with the fallowing connection string: mongodb://localhost:27017/?replicaSet=rs;readPreference=primary
I used the c# driver from NuGet, version 2.1.0.145
|
|
Hi Craig we are getting data from our Zabbix system. Network analysis and
mongo read write....
I'll get you back hope tomorrow!
|
|
Hi Luca,
I'd love to have some more information on this. I have attempted to reproduce this problem without success. I have hammered a sharded cluster with load tests over an extended period of time and have been unable to force the driver to break. I'm not saying there isn't a problem, just that I haven't been able to reproduce it without further information.
Craig
|
|
I just saw your "all threads are blocked" statement... Could you elaborate on what you mean? Perhaps some example code from one or your controller actions?
|
|
Hi Luca,
1.10 is sort-of maintained for critical bug fixes. New features may or may not show up there.
I'm attempting reproduce your heavy-loading problem. So far, I haven't had any luck. Could you give me a profile of the type of load you are pushing on it? Number of inserts/updates/deletes/reads? How many per secondy, etc...?
Craig
|
|
Hi Graig,
we asked to our Service Provider to check all connection and we have a Monitoring system (zabbix) that is not reporting any network issue during that period. So our IT exclude any network issue.
Can you also tell me if we are moving back to 1.10 is this version of driver maintained?
We currently can avoid async await that we are using only in small part of our SW.
Luca
|
|
Something certainly happened at 2015-10-05T10:16:06.273+0100. A little later, it seems like the problem is resolved, and then immediately goes back into decline again. This seems to happen over and over again. It seems like it was resolved around 2015-10-05T14:19:32.372+0100.
Can you talk with your network guys and see if anything popped up with them? Attempting to get this to reproduce is going to be difficult, and the logs don't show enough to help me in a repro. It might be a bug in the driver. It also might not be, which makes this difficult to diagnose. I'll continue under the premise this is a bug until I've exhausted all options.
|
|
We Started having problem beetween 10:15 and 14:00.
Now we swtiched back to 1.10.1 and it is working but of course we cannot use async await.
I've attached mongos logs.
We analyzed our logs and we can confirm that we haven't any network problem during that timeframe
|
|
Hi Luca,
Can you upload your mongos logs from the time when this happened? Also, did you notice any networking issues during this time?
Craig
|
|
Mongos are on window
|
Generated at Wed Feb 07 21:39:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.