<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 08:37:32 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[GODRIVER-1966] Strange delays with failCommand in v1.4.0+</title>
                <link>https://jira.mongodb.org/browse/GODRIVER-1966</link>
                <project id="14289" key="GODRIVER">Go Driver</project>
                    <description>&lt;p&gt;We see delays when we upgrade our mongo driver from `v1.3.2` to `v1.4.0`.&lt;/p&gt;

&lt;p&gt;Attached is the `main.go` used to generate the following output.&lt;/p&gt;

&lt;p&gt;It will connect a client, prepare the test space, then perform 3 read (`find`) queries and 3 write (`insert`) queries with and without the failCommand enabled. Below is a snippet of the `find` with failCommand enabled.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Output with `v1.3.2`:&lt;/p&gt;

&lt;p&gt;```&lt;/p&gt;

&lt;p&gt;== Testing Find queries failpoint ENABLED ==&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Time elapsed to do: connect client: 38.833&#181;s&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: clear failpoint: 7.497043ms&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: populate table: 995.285&#181;s&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: set failpoint: 890.543&#181;s&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: find query 0: 502.632805ms&lt;br/&gt;
 Encountered expected error: (mongo.CommandError): (NotWritablePrimary) Failing command due to &apos;failCommand&apos; failpoint&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: find query 1: 998.847787ms&lt;br/&gt;
 Encountered expected error: (mongo.CommandError): (NotWritablePrimary) Failing command due to &apos;failCommand&apos; failpoint&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: find query 2: 999.679731ms&lt;br/&gt;
 Encountered expected error: (mongo.CommandError): (NotWritablePrimary) Failing command due to &apos;failCommand&apos; failpoint&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: clear failpoint: 500.759202ms&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: clear failpoint: 1.258652ms&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: clean table post test: 1.381183ms&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: disconnect client: 1.103236ms&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;```&lt;/p&gt;

&lt;p&gt;First: ~500m&lt;/p&gt;

&lt;p&gt;Second/third: ~1s&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Output with `v1.4.0`:&lt;/p&gt;

&lt;p&gt;```&lt;/p&gt;

&lt;p&gt;== Testing Find queries failpoint ENABLED ==&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Time elapsed to do: connect client: 41.041&#181;s&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: clear failpoint: 5.948902ms&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: populate table: 946.695&#181;s&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: set failpoint: 849.93&#181;s&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: find query 0: 9.996635035s&lt;br/&gt;
 Encountered expected error: (mongo.CommandError): (NotWritablePrimary) Failing command due to &apos;failCommand&apos; failpoint&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: find query 1: 20.015427043s&lt;br/&gt;
 Encountered expected error: (mongo.CommandError): (NotWritablePrimary) Failing command due to &apos;failCommand&apos; failpoint&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: find query 2: 19.987305077s&lt;br/&gt;
 Encountered expected error: (mongo.CommandError): (NotWritablePrimary) Failing command due to &apos;failCommand&apos; failpoint&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: clear failpoint: 10.022557671s&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: clear failpoint: 1.460051ms&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: clean table post test: 1.456988ms&lt;/li&gt;
	&lt;li&gt;Time elapsed to do: disconnect client: 1.322098ms&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;```&lt;/p&gt;

&lt;p&gt;First: ~10s&lt;/p&gt;

&lt;p&gt;Second/third: ~20s&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;The query latency jumps x10. The time to clear the failpoint does as well.&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Why does this happen?&lt;/li&gt;
	&lt;li&gt;Why does it appear to double for every query after the first?&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;The same pattern appears for the write query.&lt;/p&gt;

&lt;p&gt;Impact: As a result we cannot depend on our unit tests as all queries with failCommand have this delay.&lt;/p&gt;</description>
                <environment>Local testing on:&lt;br/&gt;
macOS 10.15.7&lt;br/&gt;
go1.16.3 darwin/amd64&lt;br/&gt;
tested against mongo docker image &lt;a href=&quot;https://github.com/docker-library/mongo/blob/9db9e3d4704f5d963e424a3894fa740b8ce4ea70/4.4/Dockerfile&quot;&gt;https://github.com/docker-library/mongo/blob/9db9e3d4704f5d963e424a3894fa740b8ce4ea70/4.4/Dockerfile&lt;/a&gt;</environment>
        <key id="1680590">GODRIVER-1966</key>
            <summary>Strange delays with failCommand in v1.4.0+</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13203">Gone away</resolution>
                                        <assignee username="benji.rewis@mongodb.com">Benji Rewis</assignee>
                                    <reporter username="eliesrs@gmail.com">Ethan Lie</reporter>
                        <labels>
                    </labels>
                <created>Wed, 14 Apr 2021 23:36:43 +0000</created>
                <updated>Fri, 27 Oct 2023 20:01:17 +0000</updated>
                            <resolved>Fri, 16 Apr 2021 18:34:13 +0000</resolved>
                                    <version>1.4.0</version>
                                                    <component>API</component>
                    <component>Core API</component>
                                        <votes>1</votes>
                                    <watches>5</watches>
                                                                                                                <comments>
                            <comment id="3723043" author="benji.rewis" created="Fri, 16 Apr 2021 18:34:13 +0000"  >&lt;p&gt;No problem! Feel free to comment on this ticket with any further questions or concerns.&lt;/p&gt;</comment>
                            <comment id="3723028" author="JIRAUSER1259454" created="Fri, 16 Apr 2021 18:27:40 +0000"  >&lt;p&gt;That all makes sense to me.&lt;/p&gt;

&lt;p&gt;Thanks for the timely responses and advice. I don&apos;t think we have any further questions or issues.&lt;/p&gt;</comment>
                            <comment id="3722544" author="benji.rewis" created="Fri, 16 Apr 2021 15:40:33 +0000"  >&lt;p&gt;Happy to help &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=eliesrs%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;eliesrs@gmail.com&quot;&gt;eliesrs@gmail.com&lt;/a&gt; !&lt;/p&gt;

&lt;p&gt;According to our specifications on server discovery and monitoring, &#8220;not master&#8221; and &#8220;node is recovering&#8221; errors mark the server as unknown. You can see a list of codes and names &lt;a href=&quot;https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst#not-master-and-node-is-recovering&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As for the doubling latency, one hypothesis is that the error code you&#8217;re using, 10107, is retryable for reads like Find. The Find query is sent to the server, 10107 is returned (although with no state change), the server is marked unknown, a heartbeat after 10 seconds discovers the Find has failed. But, the first Find query retries, causing a delay of another 10 seconds in the timing of the second Find query. In total, the second find query takes 20 seconds. The same thing happens with the third Find query due to a retry of the second Find query.&lt;/p&gt;

&lt;p&gt;You can see that latency does not double when using writes, like Insert, as you have in your code already. This is because retryable writes are not enabled on standalone configurations. You can also see that latency does not double when using a non-retryable error code, such as 202.&lt;/p&gt;

&lt;p&gt;So, if you still wish to avoid the doubling in latency in your test, using a non-retryable error code is probably a good idea.&lt;/p&gt;</comment>
                            <comment id="3721230" author="JIRAUSER1259454" created="Thu, 15 Apr 2021 20:02:33 +0000"  >&lt;p&gt;Hey Benji, thanks for the quick response.&lt;/p&gt;

&lt;p&gt;I think I understand. I also I see that an error code that presumably doesn&apos;t mark the server &quot;unknown&quot; (I tested with 202,&#160;NetworkInterfaceExceededTimeLimit) doesn&apos;t trigger the delays. Between this and the heartbeat interval we should be unblocked, thanks.&lt;/p&gt;

&lt;p&gt;Is there some more info that we can provide to help track down the root cause of the doubling of the latency?&lt;/p&gt;

&lt;p&gt;EDIT: Also, where can we find a list of errors that mark the server this way?&lt;/p&gt;</comment>
                            <comment id="3721168" author="benji.rewis" created="Thu, 15 Apr 2021 19:30:00 +0000"  >&lt;p&gt;Hello &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=eliesrs%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;eliesrs@gmail.com&quot;&gt;eliesrs@gmail.com&lt;/a&gt; ! Thanks for your report. We&#8217;ve reproduced the delay on 1.4.0+.&lt;/p&gt;

&lt;p&gt;In v1.4.0, the Go driver switched from monitoring servers by polling periodically to streaming responses from servers when a state change occurs (with &lt;a href=&quot;https://jira.mongodb.org/browse/GODRIVER-1489&quot; title=&quot;Reduce Client Time To Recovery On Topology Changes&quot; class=&quot;issue-link&quot; data-issue-key=&quot;GODRIVER-1489&quot;&gt;&lt;del&gt;GODRIVER-1489&lt;/del&gt;&lt;/a&gt;). Failpoints do not cause an actual server state change; they only simulate one and raise an error. In this case, that error (NotWritablePrimary) marks the server as &#8220;unknown&#8221;, and as of 1.4.0, monitoring won&#8217;t get an updated response from the server until a heartbeat is sent after the default 10 second HeartbeatInterval. This is why your initial queries take 10 seconds to fail.&lt;/p&gt;

&lt;p&gt;It is less clear why the query delay doubles after the first find or write, but for now you should be able to use &lt;a href=&quot;https://github.com/mongodb/mongo-go-driver/blob/master/mongo/options/clientoptions.go#L491-L496&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;SetHeartbeatInterval&lt;/a&gt; on the client to lower the heartbeat interval to something like 500ms, which we do for testing. That should fix the delay on 1.4.0+.&lt;/p&gt;</comment>
                            <comment id="3719621" author="JIRAUSER1259454" created="Wed, 14 Apr 2021 23:39:04 +0000"  >&lt;p&gt;These same delays are seen in v1.5.1 as well (latest).&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="309752" name="main.go" size="5049" author="eliesrs@gmail.com" created="Wed, 14 Apr 2021 23:09:37 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hypv3z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            </customfields>
    </item>
</channel>
</rss>