[CSHARP-2390] Intermittent Server Selection Connection Timeout on Server Startup Created: 19/Sep/18  Updated: 11/Feb/19  Resolved: 11/Feb/19

Status: Closed
Project: C# Driver
Component/s: Connectivity
Affects Version/s: 2.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ali M [X] Assignee: Robert Stam
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

MongoDB version 3.2.10 running on Windows Server 2008 R2



 Description   

This is a copy and paste of a post I made in the Google forum. It doesn't appear anyone is active there (I may be wrong) so I'm creating a bug report over here.


Here is some information about our setup:

  • MongoDB version: 3.2.10
  • Driver version: 2.5.0
  • Replicaset with 3 nodes (1 primary, 2 secondary)
  • MongoDB is hosted in Azure on a VM.
  • Services that connect with mongoDB are running in an Azure Web App.
  • Connection string: mongodb://username:password@server1.xxx.com:270xx,server2.xxx.com:270xx,server3.xxx.com:270xx/?replicaSet=repSetName&maxIdleTimeMS=60000"
     
    The problem: 
    When a new server instance is created due to autoscale trigger rules, sometimes the mongo client will throw the following error on the new instance (but not always; it's intermittent):
     
     
    A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector { AllowedLatencyRange = 00:00:00.0150000 }

    }. Client view of cluster state is { ClusterId : "1", ConnectionMode : "Automatic", Type : "ReplicaSet", State : "Disconnected", Servers : [

    Unknown macro: { ServerId}

    , { ServerId: "

    { ClusterId : 1, EndPoint : "Unspecified/server1.azure.com:270xx" }

    ", EndPoint: "Unspecified/server1.azure.com:270xx", State: "Disconnected", Type: "Unknown" }, { ServerId: "

    { ClusterId : 1, EndPoint : "Unspecified/server2.azure.com:270xx" }

    ", EndPoint: "Unspecified/server2.azure.com:270xx", State: "Disconnected", Type: "Unknown" }] }.
       at MongoDB.Driver.Core.Clusters.Cluster.ThrowTimeoutException(IServerSelector selector, ClusterDescription description)
       at MongoDB.Driver.Core.Clusters.Cluster.SelectServerHelper.WaitingForDescriptionToChange()
       at MongoDB.Driver.Core.Clusters.Cluster.SelectServer(IServerSelector selector, CancellationToken cancellationToken)
       at MongoDB.Driver.MongoClient.AreSessionsSupportedAfterServerSelection(CancellationToken cancellationToken)
       at MongoDB.Driver.MongoClient.AreSessionsSupported(CancellationToken cancellationToken)
       at MongoDB.Driver.OperationExecutor.StartImplicitSession(CancellationToken cancellationToken)
       at MongoDB.Driver.MongoCollectionImpl`1.UsingImplicitSession[TResult](Func`2 func, CancellationToken cancellationToken)
       at MongoDB.Driver.MongoCollectionImpl`1.FindSync[TProjection](FilterDefinition`1 filter, FindOptions`2 options, CancellationToken cancellationToken)
       at MongoDB.Driver.FindFluent`2.ToCursor(CancellationToken cancellationToken)
       at MongoDB.Driver.IAsyncCursorSourceExtensions.FirstOrDefault[TDocument](IAsyncCursorSource`1 source, CancellationToken cancellationToken)
       at MongoDB.Driver.IFindFluentExtensions.FirstOrDefault[TDocument,TProjection](IFindFluent`2 find, CancellationToken cancellationToken)
       ....
     
    The error will continue to appear until we restart the problematic instance. 
     
    I've collected network traces using Azure's diagnostic tools and I was able to find the trace file that corresponds to the bad instance throwing the error mentioned above. This is what I found:

  • There are periodic calls from the service to mongo for "isMaster" and "buildInfo" requests.The server responds successfully on each and every request.
  • On a good server instance, hitting an endpoint that makes a request to mongo will immediately contact mongo for requested data. 
  • On a bad server instance (one that is throwing the error mentioned above), the HTTP request will come in but will immediately throw a 500 error with the above error.
  • There are 0 errors on the server side being recorded.  
     
    So even though the driver can successfully communicate with the server using isMaster and buildInfo query requests, it fails to make any requests for data and immediately returns the above error message.
     
    I don't know if it means anything to you, but I've noticed that our connection string lists the servers as Primary (west-us region), Secondary1 (west-us region), and then Secondary2 (east-us region). The error message lists the servers as Secondary2, Primary, Secondary1. Though we have server instances in west and east US that are experiencing this issue.
     
    I'm stumped and would appreciate any help.


 Comments   
Comment by Ali M [X] [ 26/Nov/18 ]

Thanks Robert. We're taking a look at updating to 2.7.2. I'll get back to you soon.

Comment by Robert Stam [ 26/Nov/18 ]

The latest 2.7.2 release of the driver has some changes that improve connectivity robustness.

Can you please try using that version of the driver and report back if you still see this issue? If so we can advise you how to configure the driver to gather more information to help us diagnose the issue.

Generated at Wed Feb 07 21:42:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.