|
From Jorik on PR#459:
Hello @JamesKovacs
This is the exception we're getting:
System.TimeoutException: A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 } }. Client view of cluster state is { ClusterId : "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Disconnected", Servers : [], DnsMonitorException : "DnsClient.DnsResponseException: Header id mismatch.
|
at DnsClient.DnsUdpMessageHandler.Query(IPEndPoint server, DnsRequestMessage request, TimeSpan timeout)
|
at DnsClient.LookupClient.ResolveQuery(IReadOnlyList`1 servers, DnsQuerySettings settings, DnsMessageHandler handler, DnsRequestMessage request, LookupClientAudit audit)
|
at DnsClient.LookupClient.QueryInternal(DnsQuestion question, DnsQuerySettings queryOptions, IReadOnlyCollection`1 servers)
|
at MongoDB.Driver.Core.Misc.DnsClientWrapper.ResolveSrvRecords(String service, CancellationToken cancellationToken)
|
at MongoDB.Driver.Core.Clusters.DnsMonitor.Monitor()" }.
|
at MongoDB.Driver.Core.Clusters.Cluster.ThrowTimeoutException(IServerSelector selector, ClusterDescription description)
|
at MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedHelper.HandleCompletedTask(Task completedTask)
|
at MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedAsync(IServerSelector selector, ClusterDescription description, Task descriptionChangedTask, TimeSpan timeout, CancellationToken cancellationToken)
|
at MongoDB.Driver.Core.Clusters.Cluster.SelectServerAsync(IServerSelector selector, CancellationToken cancellationToken)
|
at MongoDB.Driver.MongoClient.AreSessionsSupportedAfterSeverSelctionAsync(CancellationToken cancellationToken)
|
at MongoDB.Driver.MongoClient.AreSessionsSupportedAsync(CancellationToken cancellationToken)
|
at MongoDB.Driver.MongoClient.StartImplicitSessionAsync(CancellationToken cancellationToken)
|
at MongoDB.Driver.MongoCollectionImpl`1.UsingImplicitSessionAsync[TResult](Func`2 funcAsync, CancellationToken cancellationToken)
|
at MongoDB.Driver.IAsyncCursorSourceExtensions.SingleOrDefaultAsync[TDocument](IAsyncCursorSource`1 source, CancellationToken cancellationToken)
|
This is caused by AKS sending 2 DNS packets for a query. This is the support ticket we sent to AKS, and they confirmed the issue:
1. Install a custom configmap for coredns using `log.override: log` so that we can see all traffic hitting the coredns pods
2. Run a debian pod and install dnsutils and tcpdump packages
3. Start tcpdump and issue a dig, e.g. `dig microsoft.com`
4. Check that the UDP packet is sent correctly and once to kube-dns service, take note of the DNS request ID:
`21:38:02.987751 IP dotnet-playground-debian.47859 - kube-dns.kube-system.svc.cluster.local.53: 53739+ [1au] A? microsoft.com. (54)`
4. You should expect in the logs of one coredns pod to receive this packet once, but most of the times it is received twice (within the same coredns pod and with the same request ID):
`[INFO] 10.244.0.232:42274 - 53739 'A IN microsoft.com. udp 54 false 4096' NOERROR qr,aa,rd,ra 176 0.000154298s`
`[INFO] 10.244.0.232:42274 - 53739 'A IN microsoft.com. udp 54 false 4096' NOERROR qr,aa,rd,ra 176 0.000225796s`
5. Those two responses are sent back to our debian pod as a response to the dig command:
`21:38:02.989997 IP kube-dns.kube-system.svc.cluster.local.53 - dotnet-playground-debian.47859: 53739 5/0/1 A 13.77.161.179, A 40.76.4.15, A 40.113.200.201, A 40.112.72.205, A 104.215.148.63 (187)`
`21:38:03.102732 IP kube-dns.kube-system.svc.cluster.local.53 - dotnet-playground-debian.47859: 53739 5/0/1 A 40.112.72.205, A 40.113.200.201, A 13.77.161.179, A 40.76.4.15, A 104.215.148.63 (187)`
6. It looks like 75% percent of the DNS requests behaves likes this. We tried some tweaks to the pods' dnsConfig like `single-request-reopen` or changing dnsPolicy, to no avail.
|