<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 08:38:50 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[GODRIVER-2525] Occasional handshake error when using mongodb+srv with mongos pool</title>
                <link>https://jira.mongodb.org/browse/GODRIVER-2525</link>
                <project id="14289" key="GODRIVER">Go Driver</project>
                    <description>&lt;h4&gt;&lt;a name=&quot;Summary&quot;&gt;&lt;/a&gt;Summary&lt;/h4&gt;

&lt;p&gt;About once a day, we see an error like this: connection() error occurred during connection handshake: &lt;tt&gt;dial tcp: lookup foo-bar-mongos.svc.cluster.local on 169.254.25.10:53: no such host&lt;/tt&gt;&lt;/p&gt;

&lt;p&gt;We are using &lt;tt&gt;1.9.1&lt;/tt&gt; mongo driver with the following setup:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;sharded cluster&lt;/li&gt;
	&lt;li&gt;mongos instances are run as an auto-scaled pool&lt;/li&gt;
	&lt;li&gt;access to mongos is via SRV record&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Due to how relatively rare these errors are, we assume they take place when one of mongos instances are either starting or shutting down.&#160;&#160;&lt;/p&gt;

&lt;p&gt;Our guess is that the nature of the issue is in a race between SRV and A records, possibly coupled with DNS caches etc. And this seems like the kind of issue that is better handled inside a driver itself.&#160;&lt;/p&gt;

&lt;p&gt;At this time we can propose no trivial WTR for this issue. If we can be of any help with diagnosing the issue, such as enabling verbose logs and sending them to you, feel free to give instructions.&#160;&lt;/p&gt;</description>
                <environment></environment>
        <key id="2114812">GODRIVER-2525</key>
            <summary>Occasional handshake error when using mongodb+srv with mongos pool</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="10300" iconUrl="https://jira.mongodb.org/images/icons/priorities/medium.svg">Unknown</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13203">Gone away</resolution>
                                        <assignee username="benji.rewis@mongodb.com">Benji Rewis</assignee>
                                    <reporter username="petr.ivanov.s@gmail.com">Peter Ivanov</reporter>
                        <labels>
                    </labels>
                <created>Tue, 16 Aug 2022 08:42:06 +0000</created>
                <updated>Fri, 27 Oct 2023 20:01:11 +0000</updated>
                            <resolved>Fri, 23 Dec 2022 12:00:52 +0000</resolved>
                                    <version>1.9.1</version>
                                                    <component>Connections</component>
                    <component>Error Handling</component>
                                        <votes>0</votes>
                                    <watches>5</watches>
                                                                                                                <comments>
                            <comment id="5074700" author="dbeng-pm-bot" created="Fri, 23 Dec 2022 12:00:54 +0000"  >&lt;p&gt;There hasn&apos;t been any recent activity on this ticket, so we&apos;re resolving it. Thanks for reaching out! Please feel free to comment on this if you&apos;re able to provide more information.&lt;/p&gt;</comment>
                            <comment id="5041004" author="benji.rewis" created="Thu, 8 Dec 2022 16:51:32 +0000"  >&lt;p&gt;The Go driver team does not feel that allowing a configurable &lt;tt&gt;rescanSRVInterval&lt;/tt&gt; is a great fix for this situation. While we believe that &quot;knob&quot; does reduce the number of SRV lookup errors, we also think that &lt;a href=&quot;https://jira.mongodb.org/browse/GODRIVER-2579&quot; title=&quot;Incorporate connection pool checkout into server selection loop&quot; class=&quot;issue-link&quot; data-issue-key=&quot;GODRIVER-2579&quot;&gt;GODRIVER-2579&lt;/a&gt; will almost entirely remove the possibility of errors like the ones you&apos;re seeing being raised to users. We&apos;d rather not expose new API to users to help avoid odd driver behavior, as that API will likely become irrelevant and permanent (removing it post hoc would be backward-breaking) after we&apos;ve fixed the odd driver behavior. If you&apos;re intent on using a reduced SRV rescan interval, we may ask you to rely on your fork of v1.9.4 for the time being.&lt;/p&gt;</comment>
                            <comment id="5036084" author="bozaro@gmail.com" created="Wed, 7 Dec 2022 05:34:30 +0000"  >&lt;blockquote&gt;&lt;p&gt;it may be difficult for most users to reason about which value to use for&#160;&lt;tt&gt;rescanSRVIntervalMS&lt;/tt&gt;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;I don&apos;t see this as a problem. If you don&apos;t explicitly need to change this parameter, then just leave default value. In the current situation, even in order to just look at the interval, you need to get into the code.&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;Are you still seeing this issue and is that open PR from your team?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;After reducing the interval from 60 seconds to 30, the error still remained, but the probability of its occurrence decreased several times (about 5 times, but I don&apos;t remember the exact numbers).&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;Have you updated your version of the Go driver beyond 1.9.1?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Now we use version 1.9.4 with changes from &lt;a href=&quot;https://github.com/mongodb/mongo-go-driver/pull/1136&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;PR&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="5030748" author="benji.rewis" created="Mon, 5 Dec 2022 18:30:07 +0000"  >&lt;p&gt;Hello again, &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=petr.ivanov.s%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;petr.ivanov.s@gmail.com&quot;&gt;petr.ivanov.s@gmail.com&lt;/a&gt;. I&apos;m following up on this ticket, as there seems to be an &lt;a href=&quot;https://github.com/mongodb/mongo-go-driver/pull/1136&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;open PR&lt;/a&gt; related to this issue. Is the author someone from your team?&lt;/p&gt;

&lt;p&gt;While making the SRV rescan interval configurable may feasibly solve this issue for you all, we&apos;re hesitant to introduce a new &quot;knob&quot; to the driver: adding a URI option/client option for &lt;tt&gt;rescanSRVIntervalMS&lt;/tt&gt; would be a cross-drivers change, and it may be difficult for most users to reason about which value to use for &lt;tt&gt;rescanSRVIntervalMS&lt;/tt&gt;. We have an upcoming change &lt;a href=&quot;https://jira.mongodb.org/browse/GODRIVER-2579&quot; title=&quot;Incorporate connection pool checkout into server selection loop&quot; class=&quot;issue-link&quot; data-issue-key=&quot;GODRIVER-2579&quot;&gt;GODRIVER-2579&lt;/a&gt;/&lt;a href=&quot;https://jira.mongodb.org/browse/GODRIVER-2191&quot; title=&quot;Drivers should retry operations if connection handshake fails&quot; class=&quot;issue-link&quot; data-issue-key=&quot;GODRIVER-2191&quot;&gt;GODRIVER-2191&lt;/a&gt; (retrying operations if the connection handshake fails) that would probably stop these SRV lookup errors from bubbling up to your application. We would simply retry the handshake, and the retry would probably succeed given the sequence of events &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=matt.dale%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;matt.dale@mongodb.com&quot;&gt;matt.dale@mongodb.com&lt;/a&gt; describes in his comment.&lt;/p&gt;

&lt;p&gt;I have three questions:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Are you still seeing this issue and is that open PR from your team?&lt;/li&gt;
	&lt;li&gt;Have you updated your version of the Go driver beyond 1.9.1?&lt;/li&gt;
	&lt;li&gt;How do you feel about waiting for &lt;a href=&quot;https://jira.mongodb.org/browse/GODRIVER-2579&quot; title=&quot;Incorporate connection pool checkout into server selection loop&quot; class=&quot;issue-link&quot; data-issue-key=&quot;GODRIVER-2579&quot;&gt;GODRIVER-2579&lt;/a&gt;/2191 (which is currently planned in this quarter) to resolve this issue?&lt;/li&gt;
&lt;/ol&gt;
</comment>
                            <comment id="4986197" author="dbeng-pm-bot" created="Wed, 16 Nov 2022 12:00:43 +0000"  >&lt;p&gt;There hasn&apos;t been any recent activity on this ticket, so we&apos;re resolving it. Thanks for reaching out! Please feel free to comment on this if you&apos;re able to provide more information.&lt;/p&gt;</comment>
                            <comment id="4942866" author="JIRAUSER1259527" created="Tue, 1 Nov 2022 02:13:54 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=petr.ivanov.s%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;petr.ivanov.s@gmail.com&quot;&gt;petr.ivanov.s@gmail.com&lt;/a&gt; we recently discovered a bug in the SRV polling behavior of the Go Driver that may prevent changes in SRV records from updating the servers that the Go Driver attempts to connect to when the MongoDB connection string includes a username and password (see &lt;a href=&quot;https://jira.mongodb.org/browse/GODRIVER-2620&quot; title=&quot;Failure on Hostname Parsing for SRV Polling&quot; class=&quot;issue-link&quot; data-issue-key=&quot;GODRIVER-2620&quot;&gt;&lt;del&gt;GODRIVER-2620&lt;/del&gt;&lt;/a&gt; for more details). We&apos;ve fixed the bug and are planning to release the fix with Go Driver versions 1.8.6, 1.9.3, 1.10.4, and 1.11.0 tomorrow.&lt;/p&gt;

&lt;p&gt;Do you use a username and password in your MongoDB connection string? If so, please update to one of the fix versions listed above as soon as they are available and see if that prevents or reduces the handshake errors you see. Since you&apos;re using 1.9.1, I recommend updating to version 1.9.3 since it will be the least risky change.&lt;/p&gt;

&lt;p&gt;As far as server behavior, MongoDB 5.0 added a &quot;quiesce&quot; mode that&apos;s used during shutdown to allow connected drivers to gracefully remove the shutting down servers (read more about quiesce mode &lt;a href=&quot;https://www.mongodb.com/docs/manual/reference/command/shutdown/#shutting-down-the-replica-set-primary--secondary--or-mongos&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;here&lt;/a&gt;). If updating to a patched Go Driver version doesn&apos;t help, updating to MongoDB 5.0 may help.&lt;/p&gt;</comment>
                            <comment id="4925061" author="petr.ivanov.s@gmail.com" created="Tue, 25 Oct 2022 12:21:38 +0000"  >&lt;p&gt;Question 1: no, we use pretty much bare bone cluster on AWS EC2 instances. Mongos-es are run in Kubernetes and scale according to load.&#160;&lt;/p&gt;

&lt;p&gt;For question 2, I&apos;ll as a colleague to answer, but it&apos;s worth noting that we have MongoDB 4.4, and shutdown handling may have improved since then. But the issue may not be with graceful shutdown alone.&#160;&lt;/p&gt;</comment>
                            <comment id="4899707" author="JIRAUSER1259527" created="Thu, 13 Oct 2022 00:15:14 +0000"  >&lt;p&gt;Hey &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=petr.ivanov.s%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;petr.ivanov.s@gmail.com&quot;&gt;petr.ivanov.s@gmail.com&lt;/a&gt;, sorry about the slow reply. I&apos;ve been attempting to reproduce the error you described but have so far been unsuccessful. However, I have a possible sequence of events that could lead to the error:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Initialize a &lt;tt&gt;mongo.Client&lt;/tt&gt; with a &quot;mongodb+srv://&quot; scheme URI that specifies hosts &lt;tt&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;mongos1, mongos2, mongos3&amp;#93;&lt;/span&gt;&lt;/tt&gt;. The &lt;tt&gt;mongo.Client&lt;/tt&gt; creates monitoring connections to hosts &lt;tt&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;mongos1, mongos2, mongos3&amp;#93;&lt;/span&gt;&lt;/tt&gt; and determines that they are all valid &lt;tt&gt;mongos&lt;/tt&gt; instances.&lt;/li&gt;
	&lt;li&gt;Kubernetes removes pod &lt;tt&gt;mongos3&lt;/tt&gt;, removes the associated DNS record &lt;tt&gt;mongos3.svc.cluster.local&lt;/tt&gt;, and removes &lt;tt&gt;mongos3.svc.cluster.local&lt;/tt&gt; from the associated SRV record.&lt;/li&gt;
	&lt;li&gt;Run an operation using the &lt;tt&gt;mongo.Client&lt;/tt&gt;, which selects host &lt;tt&gt;mongos3&lt;/tt&gt; for the operation. The &lt;tt&gt;mongo.Client&lt;/tt&gt; still considers &lt;tt&gt;mongos3.svc.cluster.local&lt;/tt&gt; valid because it hasn&apos;t received any signals from that host.&lt;/li&gt;
	&lt;li&gt;The &lt;tt&gt;mongo.Client&lt;/tt&gt; attempts to create a new connection to &lt;tt&gt;mongos3.svs.cluster.local&lt;/tt&gt; and encounters an error like
&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;dial tcp: lookup mongos3.svc.cluster.local on 169.254.25.10:53: no such host&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;&lt;/li&gt;
	&lt;li&gt;The &lt;tt&gt;mongo.Client&lt;/tt&gt; marks &lt;tt&gt;mongos3.svc.cluster.local&lt;/tt&gt; as &quot;Unknown&quot; and prevents it from being selected for subsequent operations.&lt;/li&gt;
	&lt;li&gt;The &lt;tt&gt;mongo.Client&lt;/tt&gt; polls for the SRV record from the original &quot;mongodb+srv://&quot; URI, sees that &lt;tt&gt;mongos3.svc.cluster.local&lt;/tt&gt; is removed, and removes it from its list of servers.&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Based on that, I have a few more questions:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Are you using either the MongoDB Enterprise or Community Kubernetes Operator to manage your MongoDB cluster?&lt;/li&gt;
	&lt;li&gt;How is Kubernetes shutting down the &lt;tt&gt;mongos&lt;/tt&gt; process in the pods?&lt;br/&gt;
Typically if &lt;tt&gt;mongos&lt;/tt&gt; is shut down gracefully (e.g. shut down via sending &lt;tt&gt;SIGTERM&lt;/tt&gt; or &lt;tt&gt;SIGINT&lt;/tt&gt; or by running &lt;tt&gt;db.shutdownServer()&lt;/tt&gt;), it signals to connected drivers that it is shutting down before it becomes unavailable. However, it sounds like that is not happening, possibly indicating that &lt;tt&gt;mongos&lt;/tt&gt; is not shutting down gracefully.&lt;/li&gt;
&lt;/ol&gt;
</comment>
                            <comment id="4784211" author="petr.ivanov.s@gmail.com" created="Mon, 29 Aug 2022 15:47:13 +0000"  >&lt;ul&gt;
	&lt;li&gt;Yes, both backend service and mongos run in Kubernetes&lt;/li&gt;
	&lt;li&gt;Routing is done via a headless service&lt;/li&gt;
	&lt;li&gt;Such errors are not very numerous, so if by &apos;many&apos; you mean thousands, then no, we rarely see more than a dozen a minute from all our services&lt;/li&gt;
	&lt;li&gt;We use mongod &lt;tt&gt;4.4.10&lt;/tt&gt; and most of mongos pools is &lt;tt&gt;4.4.6&lt;/tt&gt;&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="4780413" author="JIRAUSER1259527" created="Fri, 26 Aug 2022 18:58:19 +0000"  >&lt;p&gt;Hey &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=petr.ivanov.s%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;petr.ivanov.s@gmail.com&quot;&gt;petr.ivanov.s@gmail.com&lt;/a&gt; thanks for the ticket, we&apos;re looking into it!&lt;/p&gt;

&lt;p&gt;I&apos;ve got a few questions to help me troubleshoot the issue:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Based on the provided hostname, your &lt;tt&gt;mongos&lt;/tt&gt; pool appears to be running in Kubernetes. Is that correct?&lt;/li&gt;
	&lt;li&gt;When you see those errors, do you typically see a single error or many errors within a small window?&lt;/li&gt;
	&lt;li&gt;What version of MongoDB are you connecting to?&lt;/li&gt;
&lt;/ul&gt;
</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|i0p9hk:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            </customfields>
    </item>
</channel>
</rss>