[CSHARP-1900] Intermittent loss of connectivity to Atlas instance from peered VPC Created: 20/Jan/17  Updated: 21/Mar/17  Resolved: 21/Mar/17

Status: Closed
Project: C# Driver
Component/s: Configuration, Connectivity
Affects Version/s: 2.4.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kyle Sullens Assignee: Unassigned
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 2012 instance in my VPC. Peered Atlas VPC.



 Description   

.NET web application uses C# Driver 2.3.0 to access MongoDb. App runs on an EC2 instance in my AWS account. Instance is in a VPC that is peered to the Atlas VPC. Network ACL allows all traffic to/from the ATlas VPC. SEcurity group is wide open.

IPs of my VPC are whitelisted in the Atlas security config.

From my EC2 instance, I am able to connect to the Atlas instance from the mongo shell. A PING of the Atlas hostname resolves to the correct IP.

Applicatoin was working as expected earlier in the day. No code or configuration changes occurred. Application is now not able to connect to the instance. Following error occurs:

A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = ReadPreferenceServerSelector{ ReadPreference =

{ Mode = Primary, TagSets = [] }

}, LatencyLimitingServerSelector

{ AllowedLatencyRange = 00:00:00.0150000 }

}. Client view of cluster state is { ClusterId : "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Disconnected", Servers : [{ ServerId: "

{ ClusterId : 1, EndPoint : "Unspecified/sts-armor-01-shard-00-00-42ma8.mongodb.net:27017" }

", EndPoint: "Unspecified/sts-armor-01-shard-00-00-42ma8.mongodb.net:27017", State: "Disconnected", Type: "Unknown", HeartbeatException: "MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server. ---> System.Security.Authentication.AuthenticationException: The remote certificate is invalid according to the validation procedure.



 Comments   
Comment by Robert Seaborn [ 10/Feb/17 ]

Are you sure about your Atlas VPC ID? ( vpc-284*4c )
Because that looks like it's possibly your own VPC ID (note the "vpc" prefix).
On the last line in the comment above "vpc-284*4c" is my VPC ID and "pcx-e948*0" is my Peering ID to Atlas (note the "pcx" prefix).

Comment by Robert Seaborn [ 10/Feb/17 ]

I thought the 10.0 would be reserved for VPC Peering networks only. It looks like we have similar setups.

My setup:
VPC CIDR Block: 172.**.0.0/16
Atlas CIDR Block: 10.0.0.0/16
Atlas VPC ID: pcx-e948*0

Route table:
172.**.0.0/16 local
0.0.0.0/0 igw-7c8*18
10.0.0.0/16 pcx-e948*0

In Atlas, I have this as my IP Whitelist: 172.**.0.0/16 which corresponds to the private CIDR of my VPC.
On my Atlas Peering tab, I have: vpc-284*4c - Available - pcx-e948*0 - 172.**.0.0/16 - button (Terminate)

Comment by Kyle Sullens [ 09/Feb/17 ]

Very exciting that you are up and running.

I'm confused about your CIDR scheme. Here is my setup:
VPC CIDR Block: 10.28.0.0/16
Atlas CIDR Block: 192.168.248.0/21
Atlas VPC ID: vpc-dbb60dbc

The route table on my VPC contains the following:
10.28.0.0/16 local
0.0.0.0/0 igw-e80d448c (internet gateway)
192.168.248.0/21 pcx-1ce86975

In Atlas, I have the following range in the IP Whitelist: 10.28.0.0/16, which corresponds to the private CIDR of my VPC

The only way for me to get DB connection from the server is to open the ACL for all traffic to/from 0.0.0.0/0 for the subnet that contains my EC2 instance. This tells me that the DB traffic is using the public network and not the peered connection.

Let me know if you see anything in my setup that is suspect. Maybe an Atlas tech will chime in here?......

Comment by Robert Seaborn [ 09/Feb/17 ]

Actually, I didn't see that. I was just using the step through guide on the Peering setup page (which I thought contained all the necessary instructions), which doesn't have that step.

The problem with this guide is that the CIDR blocks are reversed. They shouldn't be that way. My VPC CIDR block is 172.**.0.0/16 and my Atlas CIDR block is 10.0.0.0/16
https://www.mongodb.com/blog/post/introducing-vpc-peering-for-mongodb-atlas

I enabled the DNS, it seems to get a private IP address, but now my web servers can't ping the MongoDB servers.
ping zlor1-shard-00-00-bonqv.mongodb.net
Pinging ec2-52-42-*****.us-west-2.compute.amazonaws.com [10.0.***.150] with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Ping statistics for 10.0.***.150:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss)

I looked at my route tables and suspected that I misread the documentation or the step that I took were wrong. For the peering connection in my route table, I had 172.*.0.0/16 as my Destination and pcx-e948*0 for Target (in addition to 172.*.0.0/16 that mapped to local). After I changed the Destination to 10.0.0.0/16, my pings were successful again.

I removed the open IP block 0.0.0.0/0 from my cluster whitelist, and my pings and app were still successful. However, I found that I still needed 172.**.0.0/16 in my whitelist, which is ok since that's my VPC CIDR block.

Now everything is working great and much faster!

Comment by Kyle Sullens [ 09/Feb/17 ]

Robert -

Assuming you've reviewed this tutorial:
https://www.mongodb.com/blog/post/introducing-vpc-peering-for-mongodb-atlas

Check out step #6 "Enable DNS Hostnames". This may be the cause of your server's inability to resolve the host name to the private IP.

Comment by Robert Seaborn [ 08/Feb/17 ]

I'm using one of the servers in my cluster connection string. I can also ping this from my home computer and get the same IP address. However, tracert fails from my home computer, but I'm not sure why.

ping zlor1-shard-00-00-bonqv.mongodb.net
Pinging ec2-52-42-162-38.us-west-2.compute.amazonaws.com [52.42.162.38] with 32 bytes of data:
Reply from 52.42.162.38: bytes=32 time=34ms TTL=42
Reply from 52.42.162.38: bytes=32 time=19ms TTL=42
Reply from 52.42.162.38: bytes=32 time=21ms TTL=42
Reply from 52.42.162.38: bytes=32 time=23ms TTL=42
Ping statistics for 52.42.162.38:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 19ms, Maximum = 34ms, Average = 24ms

The tracert from my EC2 instance in my VPC does not resolve a private IP address.

Comment by Kyle Sullens [ 08/Feb/17 ]

Screenshot from WireShark:

Comment by Kyle Sullens [ 08/Feb/17 ]

Robert -

I want to replicate your test to confirm i'm seeing the same issue. Where did you get the address ec2-52-42-162-38.us-west-2.compute.amazonaws.com? I assume this is the address of your Atlas instance and you are showing the tracert from your application server on EC2. Where did you find the address for your atlas instance? I didn't see it published anywhere on the atlas dashboard.

When I run a tracert from my EC2 instance using xxxx-xxxx-xxx-shard-00-00-42ma8.mongodb.net (my Atlas instance), it responds with the correct private (192.168.x.x) IP.

My IP whitelist in Atlas is set to only allow traffic from my EC2 (peered) VPC. However, in my VPC, I have the ACL set to allow all traffic in and out (wide open security). In this configuration, my app can communicate with the Atlas instance.

When I secure the ACL by removing the ALL TRAFFIC rules and allowing only traffic to/from the 192.168.x.x/x (CIDR of the Atlas VPC), my application is not able to connect to the Atlas server. This tells me the communication is not occurring over the peered network, but over the public internet, as you indicate.

Comment by Robert Seaborn [ 07/Feb/17 ]

I think the problem is that the connection string that we have is using public domains. These aren't reachable by private networks. When I ping the one of the servers, it shows a public IP address, not a private one. Here are the results of my tracert:
Tracing route to ec2-52-42-162-38.us-west-2.compute.amazonaws.com [52.42.162.38]
over a maximum of 30 hops:
1 * * * Request timed out.
2 <1 ms <1 ms <1 ms ec2-52-42-162-38.us-west-2.compute.amazonaws.com [52.42.162.38]

I think the first represents the connection to the private network, and the second represents the connection to public network. When I open IP whitelist to anyone (public), then I'm able to connect to my servers from my web server.

Is there a different connection string we should use to connect via the private network (peering connection)?

Comment by Robert Seaborn [ 07/Feb/17 ]

Having the exact same issue. .NET web application uses the latest C# Driver 2.4.2 to access MongoDb.

Generated at Wed Feb 07 21:41:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.