[CSHARP-1895] System.TimeoutException: A timeout occured after 30000ms selecting a server using CompositeServerSelector Created: 17/Jan/17 Updated: 27/Oct/23 Resolved: 14/Jan/22 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | Connectivity |
| Affects Version/s: | 2.2.4 |
| Fix Version/s: | None |
| Type: | Task | Priority: | Critical - P2 |
| Reporter: | Anton Hnatyshen | Assignee: | Unassigned |
| Resolution: | Works as Designed | Votes: | 16 |
| Labels: | question, rp-track | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Description |
|
Hello, I have a .net core web application that gets some information from MongoDB replica set but from time to time I'm getting a timeout errors. Connection String: mongodb://<first server name>,<second server name>,<third server name>/<database name>?replicaSet=rs0&connectTimeoutMS=30000 MongoDB Driver version is 2.2.4 Error message is: }, LatencyLimitingServerSelector { AllowedLatencyRange = 00:00:00.0150000 } }. ", EndPoint: "Unspecified/<first server name>", State: "Disconnected", Type: "Unknown" }, ", EndPoint: "Unspecified/<second server name>", State: "Disconnected", Type: "Unknown" }, ", EndPoint: "Unspecified/<third server name>", State: "Disconnected", Type: "Unknown" }] }. Stack Trace Message: To trace these errors I created a tool that periodically connects to MongoDB and logs connection state. According to these logs mongo was accessible at that point of time when one of these timeouts occurred so I'm pretty sure that it's not a network issue. |
| Comments |
| Comment by James Kovacs [ 08/Mar/23 ] | |||||||||||||||||||||||||||||||||||||||
|
For more information, see Why Does the Driver Throw a Timeout During Server Selection? in the MongoDB .NET/C# Driver FAQ. | |||||||||||||||||||||||||||||||||||||||
| Comment by James Kovacs [ 14/Jan/22 ] | |||||||||||||||||||||||||||||||||||||||
|
Please note that a server selection timeout exception is a symptom and not a cause. Possible root causes are many and varied. These range from issues in network connectivity to DNS to cluster misconfiguration to firewall to many others. You can troubleshoot the issue by comparing the type of the operation to "Client view of cluster state". For example if you are attempting to perform a write to your replica set, but the Client view of cluster state does not contain a node with Type: "ReplicaSetPrimary", then you should investigate why the application server cannot see the replica set's primary. Another valuable place to look is HeartbeatException properties in the cluster state as this will include error messages indicating why the last heartbeat to that node was unsuccessful. If you need assistance in diagnosing server selection issues, you have a few options:
| |||||||||||||||||||||||||||||||||||||||
| Comment by Bouke Haarsma [ 14/Jan/22 ] | |||||||||||||||||||||||||||||||||||||||
|
After further investigation I've found the problem to be caused by memory pressure by the client application. Due to this memory condition, the execution of the program slowed down to a point where we started seeing timeouts. | |||||||||||||||||||||||||||||||||||||||
| Comment by Dmitry Lukyanov (Inactive) [ 14/Jan/22 ] | |||||||||||||||||||||||||||||||||||||||
|
Hello bhaarsma@yellowtail.nl , this error message can appear for various different reasons. In order to proceed, please provide the following information (as many as you can): 1. Full error message together with the stack trace. 2. All possible logs together with approximate time slots when the error has been thrown. Turn on and provide the SdamInformation logs. You can do it by specifying the following client settings for MongoClient:
3. Do you have async tasks that are accidentally using blocking synchronous code? 4. Also, please send the SERVER logs around this issue timings. 5. It will be helpful to see you MongoClientSettings (without sensitive information) and how do you create MongoClient, in particular whether you use a singleton pattern or no | |||||||||||||||||||||||||||||||||||||||
| Comment by Bouke Haarsma [ 14/Jan/22 ] | |||||||||||||||||||||||||||||||||||||||
|
I'm also seeing this issue intermittently with 2.14.0. Recently it started when we began using Parallel.ForEach to speed up some work over 16 concurrent threads, all starting a connection to the database around the same time. | |||||||||||||||||||||||||||||||||||||||
| Comment by James Kovacs [ 03/Dec/21 ] | |||||||||||||||||||||||||||||||||||||||
|
Hi, Cristian, Thank you for reaching out to us. Examining your cluster state, we can see:
This can occur if you connect to a replica set with name replSet0 but the actual replica set name is someOtherReplSetName. The driver detects that the requested replica set name (in the connection string or SRV record) does not match that returned by the server and the server is removed from the configuration. Given that this sounds like a support issue rather than a driver bug, I wanted to give you some resources to get this question answered more quickly:
Just in case you have already opened a support case and are not receiving sufficient help, please let me know and I can facilitate escalating your issue. Sincerely, | |||||||||||||||||||||||||||||||||||||||
| Comment by Cristian Florea [ 03/Dec/21 ] | |||||||||||||||||||||||||||||||||||||||
|
Hi, I'm experiencing the same TimeoutException issue in my integration tests. I'm using Mongo2Go library to set up a MongoDB instance in the tests: https://github.com/Mongo2Go/Mongo2Go#single-server-replica-set-mode-to-enable-transactions System.TimeoutException : A timeout occurred after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector { AllowedLatencyRange = 00:00:00.0150000 }, OperationsCountServerSelector }. Client view of cluster state is { ClusterId : "48", ConnectionMode : "Direct", Type : "ReplicaSet", State : "Disconnected", Servers : [] }. Stack Trace: | |||||||||||||||||||||||||||||||||||||||
| Comment by James Kovacs [ 02/Nov/21 ] | |||||||||||||||||||||||||||||||||||||||
|
Hi, Sahi, InvalidatedBecause:NoLongerPrimary happens when the driver receives a heartbeat from a new primary node and the old primary is invalidated. Your topology should show one of your secondaries has taken over as primary but it does not. The invalidation happened at 2021-10-31T15:05:24.4329029Z but your other nodes show last heartbeat and update 10 seconds earlier. It is also odd that the invalidated primary is disconnected. A race condition was accidentally introduced in 2.11.0, which fixed in 2.11.6. Details can be found in We are not aware of any other race conditions present in the cluster monitoring/heartbeat code. If you are not using one of the affected versions (2.11.0 to 2.11.5), we recommend opening a new CSHARP ticket and/or working with support to diagnose the issue. Sincerely, | |||||||||||||||||||||||||||||||||||||||
| Comment by Sahi kakkar [ 02/Nov/21 ] | |||||||||||||||||||||||||||||||||||||||
|
Thanks James for your reply. Will connect will mongodb support. Just in case if you have suggestion below is client view cluster state: we have 3 nodes , 2 secondary's are in connected state and 1 primary is disconnected with reason "no longer primary". in our code don't specially look for primary. Client view of cluster state is { ClusterId : "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Connected", Servers : [{ ServerId: " { ClusterId : 1, EndPoint : "Unspecified/server" }", EndPoint: "Unspecified/server", ReasonChanged: "Heartbeat", State: "Connected", ServerVersion: 4.2.11, TopologyVersion: , Type: "ReplicaSetSecondary", WireVersionRange: "[0, 8]", LastHeartbeatTimestamp: "2021-10-31T15:05:14.4403765Z", LastUpdateTimestamp: "2021-10-31T15:05:14.4403765Z" }, { ServerId: " { ClusterId : 1, EndPoint : "Unspecified/server" }", EndPoint: "Unspecified/server", ReasonChanged: "InvalidatedBecause:NoLongerPrimary", State: "Disconnected", ServerVersion: , TopologyVersion: , Type: "Unknown", LastHeartbeatTimestamp: null, LastUpdateTimestamp: "2021-10-31T15:05:24.4329029Z" }, { ServerId: " { ClusterId : 1, EndPoint : "Unspecified/server" }", EndPoint: "Unspecified/server", ReasonChanged: "Heartbeat", State: "Connected", ServerVersion: 4.2.11, TopologyVersion: , Type: "ReplicaSetSecondary", WireVersionRange: "[0, 8]", LastHeartbeatTimestamp: "2021-10-31T15:05:14.4713766Z", LastUpdateTimestamp: "2021-10-31T15:05:14.4713766Z" }] }.
| |||||||||||||||||||||||||||||||||||||||
| Comment by James Kovacs [ 01/Nov/21 ] | |||||||||||||||||||||||||||||||||||||||
|
This issue is not a bug in the driver, but is an indication that your application cannot reach the cluster due to a network error or misconfiguration. Examining the "client view of cluster state" in the exception message can often provide clues as to the origin of the problem. If the state of each node is "disconnected" that indicates that your application cannot reach the cluster nodes. If the server list is empty, that is indicative of a misconfiguration - such as mismatched replica set name or FQDN of servers not matching the names in the replica set configuration. If the states of all nodes are "secondary" but you're trying to read (with the default read preference of primary) or write, then your application cannot find the current primary node. As this sounds like a support issue, I wanted to give you some resources to get this question answered more quickly:
Sincerely, | |||||||||||||||||||||||||||||||||||||||
| Comment by Sahi kakkar [ 01/Nov/21 ] | |||||||||||||||||||||||||||||||||||||||
|
Is there any solution to this issue, facing same intermittently | |||||||||||||||||||||||||||||||||||||||
| Comment by Joshua Gbogodor [ 27/Sep/21 ] | |||||||||||||||||||||||||||||||||||||||
|
Thanks, @Boris Dogadov it's working now | |||||||||||||||||||||||||||||||||||||||
| Comment by Boris Dogadov [ 27/Sep/21 ] | |||||||||||||||||||||||||||||||||||||||
|
From the provided stack trace the initial connection can't be established due to a network error. I would suggest to ensure that MongoDB server is accessible and that there are no network issues. | |||||||||||||||||||||||||||||||||||||||
| Comment by Joshua Gbogodor [ 27/Sep/21 ] | |||||||||||||||||||||||||||||||||||||||
|
Hi, still this issue persists even with version 2.13.1, I even went further to downgrade the diver version to 2.12.0, same issue. Currently using:
Exception: System.TimeoutException: 'A timeout occurred after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector { AllowedLatencyRange = 00:00:00.0150000 }, OperationsCountServerSelector }. Client view of cluster state is { ClusterId : "1", Type : "Unknown", State : "Disconnected", Servers : [{ ServerId: " { ClusterId : 1, EndPoint : "Unspecified/localhost:27017" }", EndPoint: "Unspecified/localhost:27017", ReasonChanged: "Heartbeat", State: "Disconnected", ServerVersion: , TopologyVersion: , Type: "Unknown", HeartbeatException: "MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server. | |||||||||||||||||||||||||||||||||||||||
| Comment by Dmitry Lukyanov (Inactive) [ 27/Sep/21 ] | |||||||||||||||||||||||||||||||||||||||
|
Hello marko.saravanja@diplomat.ba , Please provide the following information: 1. Full error message together with the stack trace. 2. All possible logs together with approximate time slots when the error has been thrown. Turn on and provide the SdamInformation logs. You can do it by specifying the following client settings for MongoClient:
3. Do you have async tasks that are accidentally using blocking synchronous code? 4. Also, please send the SERVER logs around this issue timings.
Please let me know if you need my assistance with proposed above. | |||||||||||||||||||||||||||||||||||||||
| Comment by Marko S [ 27/Sep/21 ] | |||||||||||||||||||||||||||||||||||||||
|
Hi, still experiencing issues with this. Details: Currently using:
Exception:
{{A timeout occurred after 30000ms selecting a server using CompositeServerSelector , ", ", ", Error happens in 5% cases when RampUp period of virtual users is long (120sec), but in 99% cases when RampUp is 0 or some small number (like 5sec). | |||||||||||||||||||||||||||||||||||||||
| Comment by Dmitry Lukyanov (Inactive) [ 15/Apr/21 ] | |||||||||||||||||||||||||||||||||||||||
|
Hello abdulmoiz.baig.work@gmail.com , yes, please check the latest driver release (or at least 2.11.6) and let us know whether you still can see your issue. Thanks. | |||||||||||||||||||||||||||||||||||||||
| Comment by Abdul Moiz Baig [ 15/Apr/21 ] | |||||||||||||||||||||||||||||||||||||||
|
I am facing exactly the identical issue described by aggarwal.shipra98@gmail.com I am on driver version 2.11.1, below given is the complete exception I am getting. A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = WritableServerSelector, LatencyLimitingServerSelector { AllowedLatencyRange = 00:00:00.0150000 }}. Client view of cluster state is { ClusterId : "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Connected", Servers : [{ ServerId: " { ClusterId : 1, EndPoint : "172.24.80.12:27017" }", EndPoint: "172.24.80.12:27017", ReasonChanged: "Heartbeat", State: "Connected", ServerVersion: 4.2.12, TopologyVersion: , Type: "ReplicaSetSecondary", WireVersionRange: "[0, 8]", LastHeartbeatTimestamp: "2021-04-15T12:13:59.1135790Z", LastUpdateTimestamp: "2021-04-15T12:13:59.1135792Z" }, { ServerId: " { ClusterId : 1, EndPoint : "172.24.80.13:27017" }", EndPoint: "172.24.80.13:27017", ReasonChanged: "InvalidatedBecause:NoLongerPrimary", State: "Disconnected", ServerVersion: , TopologyVersion: , Type: "Unknown", LastHeartbeatTimestamp: null, LastUpdateTimestamp: "2021-04-15T12:13:59.1491878Z" }, { ServerId: " { ClusterId : 1, EndPoint : "172.24.80.14:27017" }", EndPoint: "172.24.80.14:27017", ReasonChanged: "Heartbeat", State: "Connected", ServerVersion: 4.2.12, TopologyVersion: , Type: "ReplicaSetSecondary", WireVersionRange: "[0, 8]", LastHeartbeatTimestamp: "2021-04-15T12:13:49.1506928Z", LastUpdateTimestamp: "2021-04-15T12:13:49.1506929Z" }] }. We specifically face this issue whenever I shut down my primary node(172.24.80.14) and then turn it on back. As the priority for 172.24.80.14 this node is higher than the other nodes in the cluster, so when it comes back in replica set it causes reelection and becomes primary. Whenever I perform this activity I face this timeout issue. Can you please tell me if this issue has been addressed in the later versions like 2.11.6 etc? Let me know if you need any further information in this regard. Thanks, Moiz | |||||||||||||||||||||||||||||||||||||||
| Comment by Dmitry Lukyanov (Inactive) [ 05/Mar/21 ] | |||||||||||||||||||||||||||||||||||||||
|
Hello nadeemkhoury@gmail.com, can you also provide the whole error message you see in your case? | |||||||||||||||||||||||||||||||||||||||
| Comment by Dmitry Lukyanov (Inactive) [ 05/Mar/21 ] | |||||||||||||||||||||||||||||||||||||||
|
Hello aggarwal.shipra98@gmail.com , can you please check the driver release 2.11.6. Your description looks like the case which we fixed in the scope of https://jira.mongodb.org/browse/CSHARP-3302. Thanks. | |||||||||||||||||||||||||||||||||||||||
| Comment by Shipra Aggarwal [ 05/Mar/21 ] | |||||||||||||||||||||||||||||||||||||||
|
Hey! A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = WritableServerSelector, LatencyLimitingServerSelector { AllowedLatencyRange = 00:00:00.0150000 }}. Client view of cluster state is { ClusterId : "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Connected", Servers : [{ ServerId: " { ClusterId : 1, EndPoint : "Unspecified/<server 1>" }", EndPoint: "Unspecified/<server 1>", ReasonChanged: "Heartbeat", State: "Connected", ServerVersion: 4.2.12, TopologyVersion: , Type: "ReplicaSetSecondary", Tags: "{ nodeType : ELECTABLE, region : region, workloadType : OPERATIONAL, provider : AWS }", WireVersionRange: "[0, 8]", LastHeartbeatTimestamp: "2021-03-04T16:09:58.1856150Z", LastUpdateTimestamp: "2021-03-04T16:09:58.1856150Z" }, { ServerId: " { ClusterId : 1, EndPoint : "Unspecified/<replica server 2>" }", EndPoint: "Unspecified/<replica server 2>", ReasonChanged: "Heartbeat", State: "Connected", ServerVersion: 4.2.12, TopologyVersion: , Type: "ReplicaSetSecondary", Tags: "{ workloadType : OPERATIONAL, nodeType : ELECTABLE, region : region, provider : AWS }", WireVersionRange: "[0, 8]", LastHeartbeatTimestamp: "2021-03-04T16:09:48.2185236Z", LastUpdateTimestamp: "2021-03-04T16:09:48.2185236Z" }, { ServerId: " { ClusterId : 1, EndPoint : "Unspecified/<Server 3>" }", EndPoint: "Unspecified/<Server 3>", ReasonChanged: "InvalidatedBecause:NoLongerPrimary", State: "Disconnected", ServerVersion: , TopologyVersion: , Type: "Unknown", LastHeartbeatTimestamp: null, LastUpdateTimestamp: "2021-03-04T16:09:58.2168719Z" }] }. | |||||||||||||||||||||||||||||||||||||||
| Comment by Nadeem Khoury [ 24/Feb/21 ] | |||||||||||||||||||||||||||||||||||||||
|
Hi @Dmitry Lukyanov , Actually No, i don't define any MongoClientSettings. Alll the settings are included in the connection string I shared above. The way I initiate the MongoClient and get my database like following:
var database = new MongoClient("mongodb+srv://#userName:#password@cluster0.eqam6.gcp.mongodb.net/database_name?retryWrites=true&w=majority&connect=replicaSet ").GetDatabase(settings.DatabaseName);
If there is a need to try with any MongoClientSettings please let me know. however, the version am using is 2.11.6.
Thanks | |||||||||||||||||||||||||||||||||||||||
| Comment by Dmitry Lukyanov (Inactive) [ 23/Feb/21 ] | |||||||||||||||||||||||||||||||||||||||
|
Hello nadeemkhoury@gmail.com , do you configure MongoClientSettings, if so, can you provide it? Also, please provide the whole error message. | |||||||||||||||||||||||||||||||||||||||
| Comment by Nadeem Khoury [ 23/Feb/21 ] | |||||||||||||||||||||||||||||||||||||||
|
I have the same issue. I create a mongo client for every request or service. Sometimes it works perfectly without any issue, and sometimes it doesn't work. Here is the connection string to the database.
mongodb+srv://#userName:#password@cluster0.eqam6.gcp.mongodb.net/database_name?retryWrites=true&w=majority&connect=replicaSet.
It causes a lot of problems, and I can't know the reason, please provide any help with that? | |||||||||||||||||||||||||||||||||||||||
| Comment by Bálint Nagy [ 14/Dec/20 ] | |||||||||||||||||||||||||||||||||||||||
|
I did create a MogoDB atlas account and experience the same phenomenon. From time to time I get timeouts in my integration tests. I do create and destroy entire service collections with "MongoClients" so I'm well aware that between tests the client is recreated though it's only 20 clients all together. Some part of the exception text: }. Client view of cluster state is { ClusterId : "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Disconnected", Servers : [], DnsMonitorException : "DnsClient.DnsResponseParseException: Response parser error, 244 bytes available, tried to read 1 bytes at index 244. | |||||||||||||||||||||||||||||||||||||||
| Comment by Marco Fontana [ 13/Dec/20 ] | |||||||||||||||||||||||||||||||||||||||
|
Hello, we are experiencing the same problem, apart from the fact that we are not using any clusterConfiguration. We noticed it is happening frequently in our production environment during peak hours and it disappears after some hours when the peak is finished. I can also say that this issue started happening when we moved from a Single configuration of the MongoDB server to a ReplicaSet configuration. Can you confirm that this issue is not patched with the latest version of MongoDB C# driver and it is a issue with Tasks as described here: Any fix for this? We are not able to use the C# Driver with MongoDB because of this problem, unfortunately. Thank you, Marco | |||||||||||||||||||||||||||||||||||||||
| Comment by Dmitry Lukyanov (Inactive) [ 12/Dec/19 ] | |||||||||||||||||||||||||||||||||||||||
|
av-1991@yandex.ru, see this recommendation on how to re-use MongoClient instance: http://mongodb.github.io/mongo-csharp-driver/2.10/reference/driver/connecting/#re-use. In two words, it's better to reuse a global MongoClient everywhere. Since your implementation uses clusterConfiguration, each creation of a new instance of MongoClient leads to creating a number of background tasks as well as creating a separate connection pool with separate connections. So, try to use a singleton MongoClient. Let me know if you have any questions.
| |||||||||||||||||||||||||||||||||||||||
| Comment by Valeriy Abakumov [ 12/Dec/19 ] | |||||||||||||||||||||||||||||||||||||||
|
@Dmitry Lukyanov hi! That is how we create Mongo client:
I just have figured that I use `ClusterConfigurator` property - I don't know how to set logging mongo query in another way. | |||||||||||||||||||||||||||||||||||||||
| Comment by Dmitry Lukyanov (Inactive) [ 09/Dec/19 ] | |||||||||||||||||||||||||||||||||||||||
|
I wrote about clusters that are configured on the driver side(not about server configuration). So, can you describe how are you creating MongoClient and configuring MongoClientSettings, in particular, are you passing different MongoClientSettings options for different MongoClient instances? | |||||||||||||||||||||||||||||||||||||||
| Comment by Valeriy Abakumov [ 03/Dec/19 ] | |||||||||||||||||||||||||||||||||||||||
|
Hello @Dmitry Lukyanov, No, there is NoReplicationEnabled when run rs.status(). MongoDB version is 3.6.15, compatibility version is 3.6 | |||||||||||||||||||||||||||||||||||||||
| Comment by Dmitry Lukyanov (Inactive) [ 02/Dec/19 ] | |||||||||||||||||||||||||||||||||||||||
|
Hello av-1991@yandex.ru, it looks like you have several clusters in your project. If so, try to add different SdamLogFilename paths for each cluster. Please let me know if you still have any questions. | |||||||||||||||||||||||||||||||||||||||
| Comment by Valeriy Abakumov [ 29/Nov/19 ] | |||||||||||||||||||||||||||||||||||||||
|
Hello! I was faced with the same problem. I tried enable SdamLogFilename, but get an error "The process cannot access the file 'c:\temp\nlog\app\sdamlogs.log' because it is being used by another process" - what I'm doing wrong? MongoDb.Driver is 2.9.2 Stack trace of this error: at System.IO.FileStream.ValidateFileHandle(SafeFileHandle fileHandle) | |||||||||||||||||||||||||||||||||||||||
| Comment by Dmitry Lukyanov (Inactive) [ 22/Oct/19 ] | |||||||||||||||||||||||||||||||||||||||
|
Hello artsiom.rusak@cloudcall.com, please provide the following information (as many as you can):
| |||||||||||||||||||||||||||||||||||||||
| Comment by Artsiom Rusak [ 22/Oct/19 ] | |||||||||||||||||||||||||||||||||||||||
|
Hello, guys. | |||||||||||||||||||||||||||||||||||||||
| Comment by Laxman Rapolu [ 20/Aug/19 ] | |||||||||||||||||||||||||||||||||||||||
|
Thanks Jeffrey. Will try new version(2.8) and see how it goes with this error. | |||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 20/Aug/19 ] | |||||||||||||||||||||||||||||||||||||||
|
2.9.0 was just released last week, and 2.9.1 should be out shortly with a bug fix that may interest you. Otherwise use the latest 2.8 patch release. | |||||||||||||||||||||||||||||||||||||||
| Comment by Laxman Rapolu [ 20/Aug/19 ] | |||||||||||||||||||||||||||||||||||||||
|
Thanks for the quick update @Jefferey Yemin. What version would you recommend to go with? | |||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 20/Aug/19 ] | |||||||||||||||||||||||||||||||||||||||
|
laxmankumar.rapolu@gmail.com currently we have no remaining hypotheses to explain connectivity issues. But if you're on version 2.5 of the driver you should definitely upgrade as we have made improvements since that release. | |||||||||||||||||||||||||||||||||||||||
| Comment by Laxman Rapolu [ 20/Aug/19 ] | |||||||||||||||||||||||||||||||||||||||
|
Hi All, Just wanted to check if this issue got resolved. We are using Driver version 2.5.0.0 and cosmos DB with Mango API. We are getting this exact same error (pasted in description) intermittently. I could reproduce this error in local by running 100's of data writes and read's parallelly, but even in that case it is intermittent. only 2 out 10 times I got succeeded. I couldn't find a pattern to reproduce this. I am planning to try Frank Zheng solution, but we also wanted to see if anyone else solved this with some changes or other. This is so annoying as it is happening in PROD and we are little concerned as we are seeing at least 1-3 errors in 2 days. | |||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 15/Apr/19 ] | |||||||||||||||||||||||||||||||||||||||
|
Hi craigk0609, you should have a reply on the support case now. We don't think the root cause we identified for your situation is the same as that of the original reporter of this ticket, so we're going to leave this one open. For others watching this ticket: one possible root cause of this symptom is performance of a full restore of an Atlas backup. Currently, this process also restores the replica set configuration in a way that prevents the .NET (or any) driver from re-connecting to the replica set without creating a new MongoClient. The Atlas team is working on a fix for this.
| |||||||||||||||||||||||||||||||||||||||
| Comment by Craig Kennedy [ 12/Apr/19 ] | |||||||||||||||||||||||||||||||||||||||
|
I had a support case (00551671) that they were able to reproduce this. Logs, connection string, code snippet, etc are attached to that ticket. We can reproduce the issue regularly by restoring an atlas cluster from our production environment to a preproduction environment. Upon the restored preproduction environment coming up, the 30 second timeout condition occurs until we restart our applications. | |||||||||||||||||||||||||||||||||||||||
| Comment by Dmitry Lukyanov (Inactive) [ 12/Apr/19 ] | |||||||||||||||||||||||||||||||||||||||
|
It looks like this ticket contains descriptions of the several issues which lead to the same `TimeoutException`. If your issue is still reproduced, please in addition to previous information, follow the below steps: 2. Full error message together with the stack trace. 3. All possible logs together with approximate time slots when the error has been thrown. Make sure that you use DRIVER version higher than 2.7.1. Then turn on and provide the SdamInformation logs. You can do it by specifying the following client settings for MongoClient:
Also, please send the SERVER logs. Please let me know if you need my assistance with proposed above. | |||||||||||||||||||||||||||||||||||||||
| Comment by Robert Stam [ 15/Mar/19 ] | |||||||||||||||||||||||||||||||||||||||
|
The issue appears to be related to the heartbeat Tasks not getting scheduled. Switching to a sync heartbeat thread (which we've already considered) should help with that, specially if the heartbeat threads are given AboveNormal priority. | |||||||||||||||||||||||||||||||||||||||
| Comment by Frank Zheng [ 14/Mar/19 ] | |||||||||||||||||||||||||||||||||||||||
|
Hi @austinfelipe, This is an issue lasted for at more than two years. There are a whole lot of report here and there, including I got a solution from here: https://stackoverflow.com/questions/38859755/system-timeoutexception-a-timeout-occured-after-30000ms-selecting-a-server-usin/51281357#51281357 In simple words, the MongoDB's CSharp driver is not using the async pattern correctly. When the TcpStreamFactory.ResolveEndPointsAsync calls the Dns.GetHostAddressesAsync, it hanges there. The reason behind is that, when the mongo connection is not started in the main thread, the DotNet framework will use a default task scheduler which will create threads in the thread pool for each task. It means when there are more than 300 tasks, it will start to wait 1 second before it start a new thread. Those 300 tasks will block the Dns.GetHostAddressesAsync from running for 300 seconds. This causes the timeout. This happens when you are not executing the connection on the main thread. For the detailed explanation, you need to check my link. And there is also the answer to your question of how to reproduce this error in the first paragraph. You can skip the step 1 and 2 if you don't want to know the root cause inside. Sorry that I am not able to provide the source code of my solution in the stack overflow. However, it should be easy to do it yourself with a little bit of help from the ILSpy to look at the source code of the System.Threading.Tasks.ConcurrentExclusiveSchedulerPair.ConcurrentExclusiveTaskScheduler class in the .NET Framework dll.
Good luck. Thanks,
| |||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 01/Mar/19 ] | |||||||||||||||||||||||||||||||||||||||
|
That's good to know austinfelipe. Could you provide any additional information that you have, e.g.
Thanks, | |||||||||||||||||||||||||||||||||||||||
| Comment by Austin Felipe [ 01/Mar/19 ] | |||||||||||||||||||||||||||||||||||||||
|
Jeff, I was using 2.7.3 when I got this error. I'm not sure how to reproduce this error. By the way, I just posted here because I thought the issue was still open, I could create another issue if you wish. | |||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 01/Mar/19 ] | |||||||||||||||||||||||||||||||||||||||
|
The fix to Are you able to reproduce the scenario in which an older driver version failed to reconnect to the replica set primary? If so, we'd like to know if you're able to reproduce it with the 2.7.3 driver? Or are you saying that you've already seen the issue using the 2.7.3 driver? Note that the original reporter was using version 2.2.4. | |||||||||||||||||||||||||||||||||||||||
| Comment by Austin Felipe [ 01/Mar/19 ] | |||||||||||||||||||||||||||||||||||||||
|
I'm using version 2.7.3 already. How should I run these tests? Is really hard to test it because it doesn't happen all the time. Is there any config that I should change? | |||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 01/Mar/19 ] | |||||||||||||||||||||||||||||||||||||||
|
We'd like to know if the root cause of this problem was addressed by the fix to | |||||||||||||||||||||||||||||||||||||||
| Comment by Austin Felipe [ 01/Mar/19 ] | |||||||||||||||||||||||||||||||||||||||
|
Does anyone know a workaround? It's really frustrating actually. I have the same setup using Asp Net Core + IoC and etc. | |||||||||||||||||||||||||||||||||||||||
| Comment by Timofey Myagkikh [ 20/Nov/18 ] | |||||||||||||||||||||||||||||||||||||||
|
On Windows same code work without timeouts. | |||||||||||||||||||||||||||||||||||||||
| Comment by Timofey Myagkikh [ 20/Nov/18 ] | |||||||||||||||||||||||||||||||||||||||
|
Same issue for me: Environment: After the increase in the number of requests they begin to fall on timeout, in logs:
| |||||||||||||||||||||||||||||||||||||||
| Comment by Frank Zheng [ 18/Jul/18 ] | |||||||||||||||||||||||||||||||||||||||
|
If you have no connectivity issue with MongoDB by using other tools like Robo3T, please open the Tasks window in the VS to check if there are lots of tasks and they are blocking. https://stackoverflow.com/questions/38859755/system-timeoutexception-a-timeout-occured-after-30000ms-selecting-a-server-usin/51281357#51281357 | |||||||||||||||||||||||||||||||||||||||
| Comment by Matt Sneller [ 23/Mar/18 ] | |||||||||||||||||||||||||||||||||||||||
|
I'm having this same issue. Seems to happen when the network abruptly drops. In my case, I have mongoDB running on Ubunutu VMs in Azure. The VM will lose all network connectivity, as expected, the remaining mongoDB servers will elect a new primary. However, I have to restart all clients to get a connection again. Looks like this only happens in the C# driver. I have a few python services that do not have this problem. | |||||||||||||||||||||||||||||||||||||||
| Comment by ALOK MEHROTRA [ 05/Nov/17 ] | |||||||||||||||||||||||||||||||||||||||
|
Hi, Facing similar issue with version "2.4.4". Have shared full details in Regards | |||||||||||||||||||||||||||||||||||||||
| Comment by Kevin Fairs [ 31/Oct/17 ] | |||||||||||||||||||||||||||||||||||||||
|
Hello Is there any update to this, or any timescale for resolution? I am also experiencing this in very similar circumstances when connecting to Atlas using 2.4.0 Thanks |