<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 08:36:19 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[GODRIVER-1404] Improve performance of SelectServer</title>
                <link>https://jira.mongodb.org/browse/GODRIVER-1404</link>
                <project id="14289" key="GODRIVER">Go Driver</project>
                    <description>&lt;p&gt;We&apos;re seeing a large fraction of time being spent in server selection; around 30% of the time taken by &quot;driver.Operation.Execute&quot; is being spent in &quot;driver.Operation.selectServer&quot;, vs 45% for &quot;driver.Operation.roundTrip&quot;, which obviously includes network roundtrip(s).&lt;/p&gt;

&lt;p&gt;A large fraction of &quot;driver.Operation.selectServer&quot; seems to be around &quot;Subscribe&quot;, and it seems that could easily be avoidable in an initial fast path, where you&apos;d expect a server to always be available.  Related to this, we&apos;re also seeing server selection timeouts with a 20ms server selection timeout &amp;#8211; it appears that the subscription logic is sometimes slow enough that we&apos;re hitting that timeout even if server selection could otherwise be satisfied.&lt;/p&gt;</description>
                <environment></environment>
        <key id="992963">GODRIVER-1404</key>
            <summary>Improve performance of SelectServer</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13201">Fixed</resolution>
                                        <assignee username="divjot.arora@mongodb.com">Divjot Arora</assignee>
                                    <reporter username="bartle">David Bartley</reporter>
                        <labels>
                    </labels>
                <created>Thu, 7 Nov 2019 20:22:05 +0000</created>
                <updated>Sat, 28 Oct 2023 11:38:26 +0000</updated>
                            <resolved>Fri, 15 Nov 2019 22:15:15 +0000</resolved>
                                                    <fixVersion>1.2.0</fixVersion>
                                    <component>Server Selection</component>
                                        <votes>0</votes>
                                    <watches>2</watches>
                                                                                                                <comments>
                            <comment id="2916496" author="divjot.arora" created="Fri, 28 Feb 2020 15:29:57 +0000"  >&lt;p&gt;That makes sense to me. I&apos;ve filed &lt;a href=&quot;https://jira.mongodb.org/browse/GODRIVER-1499&quot; title=&quot;Remove timeout check from server selection fast path&quot; class=&quot;issue-link&quot; data-issue-key=&quot;GODRIVER-1499&quot;&gt;&lt;del&gt;GODRIVER-1499&lt;/del&gt;&lt;/a&gt; for this.&lt;/p&gt;</comment>
                            <comment id="2914899" author="bartle" created="Thu, 27 Feb 2020 19:20:16 +0000"  >&lt;p&gt;Sorry, tbc, I was suggesting removing &lt;b&gt;both&lt;/b&gt; the ctx.Done and timeoutChan checks from selectServerFromDescription.&lt;/p&gt;

&lt;p&gt;I guess it depends on whether you consider the fast-path server selection to be something that could actually be considered blocking? It seems like you&apos;d almost always follow server selection with an actual command, which would check for cancellation before hitting the network?&lt;/p&gt;</comment>
                            <comment id="2914385" author="divjot.arora" created="Thu, 27 Feb 2020 15:53:19 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=bartle&quot; class=&quot;user-hover&quot; rel=&quot;bartle&quot;&gt;bartle&lt;/a&gt; Thank you for your feedback on this change. I don&apos;t think removing the &lt;tt&gt;timeoutChan&lt;/tt&gt; check is sufficient, as a user could also provide a timeout through the Context sent to the function. It would be strange if a timeout provided through &lt;tt&gt;SetServerSelectionTimeout&lt;/tt&gt; behaved differently than one in the Context.&lt;/p&gt;

&lt;p&gt;I did consider the possibility of adding a fast path that did no timeout checks so we could ensure that one server selection check is always done, but this brings up the weird case of a user running an operation with an already-cancelled context. In this case, we should fail fast and return an error at first possibly blocking point, which would be server selection. Thoughts?&lt;/p&gt;</comment>
                            <comment id="2913118" author="bartle" created="Wed, 26 Feb 2020 21:35:26 +0000"  >&lt;p&gt;We recently upgraded to 1.3.0, so I didn&apos;t get a chance to test this before, but I&apos;m curious if we should actually be checking &lt;tt&gt;selectionState.timeoutChan&lt;/tt&gt; in &lt;tt&gt;selectServerFromDescription&lt;/tt&gt;?  If you were to set ServerSelectionTimeout to 1ms, if you wanted fail-fast behaviour, it seems there&apos;s some risk that you could still fail server selection even if there were a primary available?&lt;/p&gt;

&lt;p&gt;It also seems redundant, given that selectServerFromSubscription does this check as well.&lt;/p&gt;</comment>
                            <comment id="2544118" author="xgen-internal-githook" created="Fri, 15 Nov 2019 22:15:06 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;Divjot Arora&apos;, &apos;username&apos;: &apos;divjotarora&apos;, &apos;email&apos;: &apos;divjot.arora@10gen.com&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/GODRIVER-1404&quot; title=&quot;Improve performance of SelectServer&quot; class=&quot;issue-link&quot; data-issue-key=&quot;GODRIVER-1404&quot;&gt;&lt;del&gt;GODRIVER-1404&lt;/del&gt;&lt;/a&gt; Remove topology subscription initial server selection (#221)&lt;br/&gt;
Branch: master&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo-go-driver/commit/08f9f9e42b104c4ee367feb1f2bfff1def66eb54&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo-go-driver/commit/08f9f9e42b104c4ee367feb1f2bfff1def66eb54&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="2537770" author="divjot.arora" created="Tue, 12 Nov 2019 21:44:24 +0000"  >&lt;p&gt;PR:&#160;&lt;a href=&quot;https://github.com/mongodb/mongo-go-driver/pull/221&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo-go-driver/pull/221&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="2526423" author="bartle" created="Sat, 9 Nov 2019 00:10:47 +0000"  >&lt;p&gt;Yeah, that looks pretty similar!&lt;/p&gt;</comment>
                            <comment id="2526416" author="divjot.arora" created="Fri, 8 Nov 2019 23:59:55 +0000"  >&lt;p&gt;I can consistently repro by doing 10000 inserts with 10000 goroutines on a 3 node replicaset. I&apos;ve attached my repro script and the pprof output I&apos;m seeing. Note that &lt;tt&gt;Subscribe&lt;/tt&gt; doesn&apos;t consistently show up in the pprof output across runs. I&apos;ve attached my repro script and a sample pprof output from one of the runs that does have &lt;tt&gt;Subscribe&lt;/tt&gt; in it. Can you confirm that this is similar to what you&apos;re seeing?&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/attachment/236798/236798_server_selection.go&quot; title=&quot;server_selection.go attached to GODRIVER-1404&quot;&gt;server_selection.go&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.mongodb.org/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/attachment/236797/236797_cpu.pprof&quot; title=&quot;cpu.pprof attached to GODRIVER-1404&quot;&gt;cpu.pprof&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.mongodb.org/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;</comment>
                            <comment id="2526339" author="bartle" created="Fri, 8 Nov 2019 22:49:36 +0000"  >&lt;p&gt;No, failover isn&apos;t necessary, we were seeing these timeouts steady-state across all of our 5 node replsets.  I wonder if modifying your test to issue inserts (or even queries) in parallel might be more realistic?&lt;/p&gt;</comment>
                            <comment id="2525332" author="divjot.arora" created="Fri, 8 Nov 2019 16:22:16 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=bartle&quot; class=&quot;user-hover&quot; rel=&quot;bartle&quot;&gt;bartle&lt;/a&gt; Thanks for the info. I do think it&apos;s worth figuring out if &lt;tt&gt;Subscribe&lt;/tt&gt; is the issue and why it&apos;s taking so long if it is. I tried to repro this by setting up a &lt;tt&gt;Client&lt;/tt&gt; with a 20ms server selection timeout and running 10,000 serial inserts against the same collection. &lt;tt&gt;pprof&lt;/tt&gt; output does show ~20ms spent in &lt;tt&gt;SelectServer&lt;/tt&gt;, but the bulk of the time is spent in timer creation and pthread condition variable singaling code. On one of the runs, it showed that the bulk of the time was spent in &lt;tt&gt;runtime.duffcopy&lt;/tt&gt;. I have not been able to repro a situation where &lt;tt&gt;Subscribe&lt;/tt&gt; is taking a lot of time.&lt;/p&gt;

&lt;p&gt;These repros were running against a healthy 3-node replica set that did not fail over during program execution. Is this too isolated of an environment to repro what you&apos;re seeing? You mentioned the low server selection timeout was to fail fast during a failover. Is a failover necessary to repro the server selection timeout?&lt;/p&gt;</comment>
                            <comment id="2523641" author="bartle" created="Thu, 7 Nov 2019 23:37:04 +0000"  >&lt;p&gt;We&apos;re seeing this in an internal proxy, so I&apos;m not sure I can share much more detail.  We&apos;re setting ServerSelectionTimeout to 20ms (we basically want to fail fast requests and let clients handle retries, typically during failovers) and are seeing regular &quot;server selection timeout&quot; errors.  I added some additional logging and confirmed that we&apos;re hitting &lt;a href=&quot;https://github.com/mongodb/mongo-go-driver/blob/85a8e363c138e7c405976deb9816c72efe54f7b4/x/mongo/driver/topology/topology.go#L402&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo-go-driver/blob/85a8e363c138e7c405976deb9816c72efe54f7b4/x/mongo/driver/topology/topology.go#L402&lt;/a&gt; in the first loop iteration, despite the server selection being satisfiable.&lt;/p&gt;</comment>
                            <comment id="2523348" author="divjot.arora" created="Thu, 7 Nov 2019 21:38:24 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=bartle&quot; class=&quot;user-hover&quot; rel=&quot;bartle&quot;&gt;bartle&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Thank you for the report. Can you provide some more details about the perf issues you&apos;re seeing with &lt;tt&gt;SelectServer&lt;/tt&gt;? The &lt;tt&gt;Subscribe&lt;/tt&gt; function does take &lt;tt&gt;t.subLock&lt;/tt&gt; which could be a potential perf issue but otherwise does not do much work (gets current description, updates a map, and makes a channel). Some concrete numbers or another type of profiling output might help us figure out what&apos;s causing the issue.&lt;/p&gt;

&lt;p&gt;&amp;#8211; Divjot&lt;/p&gt;</comment>
                            <comment id="2523297" author="bartle" created="Thu, 7 Nov 2019 21:14:09 +0000"  >&lt;p&gt;I suspect &lt;a href=&quot;https://github.com/mongodb/mongo-go-driver/blob/e82d777bfb1457703cc5b88d7b8f8bb8cc22a42a/x/mongo/driver/topology/topology.go#L406-L411&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo-go-driver/blob/e82d777bfb1457703cc5b88d7b8f8bb8cc22a42a/x/mongo/driver/topology/topology.go#L406-L411&lt;/a&gt; is also redundant, since things like &quot;ReadPrefSelector&quot; ultimately use &quot;selectByKind&quot;, which obviously filter by e.g. RSPrimary or RSSecondary already.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="236797" name="cpu.pprof" size="9742" author="divjot.arora@mongodb.com" created="Fri, 8 Nov 2019 23:59:29 +0000"/>
                            <attachment id="236798" name="server_selection.go" size="1393" author="divjot.arora@mongodb.com" created="Fri, 8 Nov 2019 23:59:29 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                                                        <customfield id="customfield_10011" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Backwards Compatibility</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10038"><![CDATA[Fully Compatible]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hvlng7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            </customfields>
    </item>
</channel>
</rss>