[GODRIVER-1516] Go driver does not appear to obey DNS changes Created: 03/Mar/20 Updated: 27/Oct/23 Resolved: 16/Mar/20 |
|
| Status: | Closed |
| Project: | Go Driver |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Ben Birt | Assignee: | Divjot Arora (Inactive) |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
Apologies if this isn't the right place to file this!
I'm running my client code (which uses the Go MongoDB driver) on Kubernetes, along with 3 hosts running MongoDB (in a replica set). The client is configured to connect to "mongodb://mongo-0.mongo,mongo-1.mongo,mongo-2.mongo:27017". This works.
However, I have just noticed that if all Mongo replicas restart (and are given new IP addresses by Kubernetes), the client code doesn't seem to pick up this change; trying to run Mongo queries fails with 'connection reset by peer' errors which indicate that the client was trying to communicate with the old IP addresses.
I'm not sure what I should be doing here to handle this. Have we misconfigured some monitor setup, or do we need to configure some DNS resolution refresh frequency option? Should we be using "mongodb+srv" connection strings? I'm a little lost. |
| Comments |
| Comment by Divjot Arora (Inactive) [ 16/Mar/20 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
ben@dataform.co Glad to hear it worked! I'm going to close out this ticket as "Works as Designed" but feel free to leave another comment or open a new ticket if you have any other issues! – Divjot | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ben Birt [ 16/Mar/20 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Amazing, thank you! I've set that variable, and everything appears to be working nicely:
Would you like me to close this issue? | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Divjot Arora (Inactive) [ 13/Mar/20 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
I think we have an understanding of the root cause and some ideas for fixing it. The issue is happening because the isMaster responses report the hardcoded IP addresses in the hosts field, which is what the driver uses. So the sequence of events is the following (this is a simplified view for a single-node replica set, but can be generalized to your three node case as well):
Hopefully that gives you a sense of what's going on under the hood. Given the attached isMaster responses, this is expected behavior. On the https://github.com/cvallance/mongo-k8s-sidecar site, I noticed that it says:
The table of settings at https://github.com/cvallance/mongo-k8s-sidecar#settings also mentions the KUBERNETES_MONGO_SERVICE_NAME environment variable and the page later says
My understanding is that your app server is running internally in the k8s cluster, so you should be able to set the KUBERNETES_MONGO_SERVICE_NAME environment variable, which would cause the nodes to report hostnames in their isMaster responses and allow the driver to track hostnames instead of IP addresses. Is this correct? If you do make this change, you can verify that everything is correct via the isMaster response from any of the nodes.
– Divjot
| ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ben Birt [ 13/Mar/20 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Sure, I've attached various files, let me know if you need something more. (Note that the StorageClass resource in the mongo YAML assumes use of GKE.) | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Divjot Arora (Inactive) [ 13/Mar/20 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Apologies for the back and forth on this ticket, but I have to ask for some more information to figure out where the IP addresses are coming from. Can you provide the following:
Given all of this, I'm going to try to reproduce this locally and see if anything sticks out. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ben Birt [ 13/Mar/20 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the update!
That's right, those hostnames are from Kubernetes. If it's any help, I followed this guide (roughly, with a couple of edits) to run Mongo on k8s: https://kubernetes.io/blog/2017/01/running-mongodb-on-kubernetes-with-statefulsets/. If it's helpful, I can also share our k8s YAML with you. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Divjot Arora (Inactive) [ 12/Mar/20 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
ben@dataform.co Thanks for the error output, it's definitely helpful. One thing that's confusing me is why there are hardcoded IP addresses in the error output. Using an Atlas cluster, I've verified that the driver does not resolve hostnames to IP addresses at any point. Every time we make a new connection, we pass the hostname from the original connection string (in your case, this would be something like "mongo-0.mongo") to net.Dialer.DialContext(), which presumably does the correct DNS lookup. I'm still investigating on our end and just wanted to give you an update of where I am. Can you give any insight on where the hostnames like "mongo-0.mongo" come from? Are they generated by Kubernetes? | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ben Birt [ 12/Mar/20 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Here's an example error, hopefully it helps! {{server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [ { Addr: 10.8.4.214:27017, Type: Unknown, State: Connected, Average RTT: 0, Last error: connection() : connection(10.8.4.214:27017[-16521]) incomplete read of message header: read tcp 10.8.3.222:59930->10.8.4.214:27017: read: connection reset by peer }, { Addr: 10.8.3.221:27017, Type: Unknown, State: Connected, Average RTT: 0, Last error: connection() : connection(10.8.3.221:27017[-16520]) incomplete read of message header: read tcp 10.8.3.222:38548->10.8.3.221:27017: read: connection reset by peer }, { Addr: 10.8.8.146:27017, Type: Unknown, State: Connected, Average RTT: 0, Last error: connection() : connection(10.8.8.146:27017[-16519]) incomplete read of message header: read tcp 10.8.3.222:49938->10.8.8.146:27017: read: connection reset by peer }, ] } }} | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Divjot Arora (Inactive) [ 11/Mar/20 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Hi ben@dataform.co, Can you post the full output of the error you get when running queries? |