[CSHARP-2477] MongoDb Driver hangs out for really long time when mongos going down Created: 11/Jan/19  Updated: 25/Jun/20  Resolved: 25/Jun/20

Status: Closed
Project: C# Driver
Component/s: Command Operations
Affects Version/s: 2.7.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jakub Dropia Assignee: Wan Bachtiar
Resolution: Done Votes: 0
Labels: wb
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu Server 18.04 LTS VM on Azure


Case:

 Description   

I have sharded cluster with auto-scale on scale sets in Azure. This error seems to happen, when my client writes to Mongo, cluster has 3 mongos, 2 replica-sets. Client writes with high possible throughput using many threads and connections.
Additional settings:

settings.MaxConnectionPoolSize = 2000;
settings.WaitQueueSize = 10000;

Then one mongos is going down (because scale set decide to shut it down).
First i gets errors of this kind:

"[2019-01-10 17:18:56.130 GMT 28 Error 0 Lingaro.OpenIoT.Orleans.Grains.DeviceGrain] !!!!!!!!!! Error during grain execution! Setting previous state
Exc level 0: MongoDB.Driver.MongoWriteException: A write operation resulted in an error.
  Write results unavailable from mongo-s1-2:27017 due to interrupted at shutdown
   at MongoDB.Driver.MongoCollectionBase`1.ReplaceOneAsync(FilterDefinition`1 filter, TDocument replacement, UpdateOptions options, Func`3 bulkWriteAsync)
   at Lingaro.OpenIoT.Orleans.Grains.DeviceGrain.WriteStateAsync() in C:\src\iotplatformdev\orleans\Lingaro.OpenIoT\Lingaro.OpenIoT.Orleans.Grains\Grains\DeviceGrain.cs:line 108
   at Lingaro.OpenIoT.Orleans.Grains.DeviceGrain.<Process>b__5_0() in C:\src\iotplatformdev\orleans\Lingaro.OpenIoT\Lingaro.OpenIoT.Orleans.Grains\Grains\DeviceGrain.cs:line 92
   at Lingaro.OpenIoT.Orleans.Grains.DeviceGrain.WrapInTimeout[T](String actionName, TimeSpan timeout, Func`1 action) in C:\src\iotplatformdev\orleans\Lingaro.OpenIoT\Lingaro.OpenIoT.Orleans.Grains\Grains\DeviceGrain.cs:line 141
   at Lingaro.OpenIoT.Orleans.Grains.DeviceGrain.Process(EventHubMessage eventHubMessage) in C:\src\iotplatformdev\orleans\Lingaro.OpenIoT\Lingaro.OpenIoT.Orleans.Grains\Grains\DeviceGrain.cs:line 90
Exc level 1: MongoDB.Driver.MongoBulkWriteException`1[MongoDB.Bson.BsonDocument]: A bulk write operation resulted in one or more errors.
  Write results unavailable from mongo-s1-2:27017 due to interrupted at shutdown
   at MongoDB.Driver.MongoCollectionImpl`1.BulkWriteAsync(IClientSessionHandle session, IEnumerable`1 requests, BulkWriteOptions options, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.UsingImplicitSessionAsync[TResult](Func`2 funcAsync, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionBase`1.ReplaceOneAsync(FilterDefinition`1 filter, TDocument replacement, UpdateOptions options, Func`3 bulkWriteAsync)"

And it seems reasonable as one mongos is shutting down.
But then after working a while, whole process hang out. After a really long time i gets something like this:

"[2019-01-10 17:35:06.509 GMT 28Error0Lingaro.OpenIoT.Orleans.Grains.DeviceGrain]!!!!!!!!!! Operation WriteStateAsync took long time to complete! Took 954750.3796 ms.
[2019-01-10 17:35:06.508 GMT 27Error0Lingaro.OpenIoT.Orleans.Grains.DeviceGrain]!!!!!!!!!! Error during grain execution! Setting previous state
Exc level 0: MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server.
   at MongoDB.Driver.Core.Connections.BinaryConnection.OpenHelperAsync(CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Servers.Server.GetChannelAsync(CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.RetryableWriteContext.InitializeAsync(CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.RetryableWriteContext.CreateAsync(IWriteBinding binding, Boolean retryRequested, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.BulkMixedWriteOperation.ExecuteAsync(IWriteBinding binding, CancellationToken cancellationToken)
   at MongoDB.Driver.OperationExecutor.ExecuteWriteOperationAsync[TResult](IWriteBinding binding, IWriteOperation`1 operation, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.ExecuteWriteOperationAsync[TResult](IClientSessionHandle session, IWriteOperation`1 operation, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.BulkWriteAsync(IClientSessionHandle session, IEnumerable`1 requests, BulkWriteOptions options, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.UsingImplicitSessionAsync[TResult](Func`2 funcAsync, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionBase`1.ReplaceOneAsync(FilterDefinition`1 filter, TDocument replacement, UpdateOptions options, Func`3 bulkWriteAsync)
   at Lingaro.OpenIoT.Orleans.Grains.DeviceGrain.WriteStateAsync() in C:\src\iotplatformdev\orleans\Lingaro.OpenIoT\Lingaro.OpenIoT.Orleans.Grains\Grains\DeviceGrain.cs:line 108
   at Lingaro.OpenIoT.Orleans.Grains.DeviceGrain.<Process>b__5_0() in C:\src\iotplatformdev\orleans\Lingaro.OpenIoT\Lingaro.OpenIoT.Orleans.Grains\Grains\DeviceGrain.cs:line 92
   at Lingaro.OpenIoT.Orleans.Grains.DeviceGrain.WrapInTimeout[T](String actionName, TimeSpan timeout, Func`1 action) in C:\src\iotplatformdev\orleans\Lingaro.OpenIoT\Lingaro.OpenIoT.Orleans.Grains\Grains\DeviceGrain.cs:line 141
   at Lingaro.OpenIoT.Orleans.Grains.DeviceGrain.Process(EventHubMessage eventHubMessage) in C:\src\iotplatformdev\orleans\Lingaro.OpenIoT\Lingaro.OpenIoT.Orleans.Grains\Grains\DeviceGrain.cs:line 90
Exc level 1: MongoDB.Driver.MongoConnectionException: An exception occurred while receiving a message from the server.
   at MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveBufferAsync()
   at MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveBufferAsync(Int32 responseTo, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveMessageAsync(Int32 responseTo, IMessageEncoderSelector encoderSelector, MessageEncoderSettings messageEncoderSettings, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.WireProtocol.CommandUsingQueryMessageWireProtocol`1.ExecuteAsync(IConnection connection, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Connections.IsMasterHelper.GetResultAsync(IConnection connection, CommandWireProtocol`1 isMasterProtocol, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Connections.ConnectionInitializer.InitializeConnectionAsync(IConnection connection, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Connections.BinaryConnection.OpenHelperAsync(CancellationToken cancellationToken)
Exc level 2: System.IO.IOException: Unable to read data from the transport connection: Connection timed out.
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
   at System.Threading.Tasks.ValueTask`1.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state)
— End of stack trace from previous location where exception was thrown —
   at MongoDB.Driver.Core.Misc.StreamExtensionMethods.ReadBytesAsync(Stream stream, Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveBufferAsync()
Exc level 3: System.Net.Sockets.SocketException: Connection timed out"

This indicate that Writing to mongoDB timeout after 954750.3796 ms!!!

And i can correlate this with more mongos going down.
After that long timeout occurred - processing start happen again and writing to mongoDB going with success.
Sometimes this situation can happen twice or more. (Hanging for really long time). 



 Comments   
Comment by Rachelle Palmer [ 25/Jun/20 ]

Hi there, thank you for reaching out to MongoDB. As this sounds more like a support issue, I wanted to give you some resources to get this question answered more quickly:

  • our MongoDB support portal, located at support.mongodb.com
  • our MongoDB community forums, located here
  • If you are an Atlas customer, there is free support offered 24/7 in the lower right hand corner of the UI.

Thank you!
Rachelle

Comment by vikrant raghuvanshi [ 15/Apr/20 ]

Is there any update on this issue?
It was raised in Jan 2019 and it still persists in April 2020. So, is someone really looking into it or not?

Comment by Jakub Dropia [ 28/Feb/19 ]

Hi! Any progress on this? 

Generated at Wed Feb 07 21:42:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.