<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 08:37:40 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[GODRIVER-2024] Connection pool, long semaphore wait causes connection close</title>
                <link>https://jira.mongodb.org/browse/GODRIVER-2024</link>
                <project id="14289" key="GODRIVER">Go Driver</project>
                    <description>&lt;h3&gt;&lt;a name=&quot;Problem&quot;&gt;&lt;/a&gt;Problem&lt;/h3&gt;
&lt;p&gt;We found the driver unnecessarily closes connections and clears the connection pool under high load.&lt;/p&gt;

&lt;p&gt;This occurs when the semaphore wait time to acquire a connection approaches the context timeout.  If a connection is acquired with little to no context deadline left the connection is closed as any use of the connection results in a timeout.  After the connection is closed other go routines will attempt to open a connection with a similarly low deadline; when the new connection fails to create, the entire pool is cleared (generation iterated).  This non-virtuous cycle repeats and both increases error rates and cluster cpu (to serve creating the new connections).&lt;/p&gt;

&lt;h3&gt;&lt;a name=&quot;ProposedSolution&quot;&gt;&lt;/a&gt;Proposed Solution&lt;/h3&gt;
&lt;ol&gt;
	&lt;li&gt;Publish metrics for connection pool checkout duration (semaphore wait time)&lt;/li&gt;
	&lt;li&gt;Prevent closing connections when remaining deadline is below a threshold.  This can be accomplished in one of a few ways:
	&lt;ol&gt;
		&lt;li&gt;Add a client option for minimum connection io duration. After acquiring a connection if the context has a deadline and the remaining duration is below the minimum connection io duration fail fast before attempting to use the connection.&lt;/li&gt;
		&lt;li&gt;Add a client option for maximum connection pool checkout duration (semaphore wait duration).  If the context has a deadline and the deadline is greater than the maximum checkout duration, call acquire with a new context with a deadline equal to the maximum semaphore wait time.&lt;/li&gt;
	&lt;/ol&gt;
	&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;example error pattern:&lt;/p&gt;
&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;time=&quot;2021-05-24T14:29:24-07:00&quot; level=info msg=mongo_pool_event activity=true connection_id=0 reason=timeout type=ConnectionCheckOutFailedSemaphore&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;time=&quot;2021-05-24T14:29:24-07:00&quot; level=info msg=mongo_pool_event activity=true connection_id=0 reason=&quot;ProcessHandshakeError: connection() error occured during connection handshake: context deadline exceeded&quot; type=ConnectionPoolCleared&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;

&lt;p&gt;A way to replicate this problem locally is to run a script with high concurrency, low timeout and low maximum connection pool count.  &lt;/p&gt;</description>
                <environment></environment>
        <key id="1756708">GODRIVER-2024</key>
            <summary>Connection pool, long semaphore wait causes connection close</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="2" iconUrl="https://jira.mongodb.org/images/icons/priorities/critical.svg">Critical - P2</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="matt.dale@mongodb.com">Matt Dale</assignee>
                                    <reporter username="akahn@tesla.com">Aaron Kahn</reporter>
                        <labels>
                    </labels>
                <created>Tue, 25 May 2021 19:35:21 +0000</created>
                <updated>Thu, 2 Sep 2021 14:35:21 +0000</updated>
                            <resolved>Mon, 26 Jul 2021 19:46:14 +0000</resolved>
                                    <version>1.5.2</version>
                                                    <component>Connections</component>
                                        <votes>0</votes>
                                    <watches>6</watches>
                                                                                                                <comments>
                            <comment id="3963833" author="JIRAUSER1259527" created="Mon, 26 Jul 2021 19:45:40 +0000"  >&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/browse/GODRIVER-2038&quot; title=&quot;Use &amp;quot;ConnectionTimeout&amp;quot; for creating all new connections and background connection creation&quot; class=&quot;issue-link&quot; data-issue-key=&quot;GODRIVER-2038&quot;&gt;&lt;del&gt;GODRIVER-2038&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.mongodb.org/browse/GODRIVER-2078&quot; title=&quot;Stress test Go driver connections during Evergreen runs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;GODRIVER-2078&quot;&gt;&lt;del&gt;GODRIVER-2078&lt;/del&gt;&lt;/a&gt; are in-progress and should resolve the problematic driver behavior described in our communications with the Tesla team. The Tesla team is currently unblocked by running on their own fork with &lt;a href=&quot;https://github.com/teslamotors/mongo-go-driver/pull/2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;changes&lt;/a&gt; that create all connections in the background. I&apos;m closing this ticket because we have a resolution path tracked in other tickets.&lt;/p&gt;</comment>
                            <comment id="3860378" author="JIRAUSER1259527" created="Fri, 4 Jun 2021 17:25:14 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=akahn%40tesla.com&quot; class=&quot;user-hover&quot; rel=&quot;akahn@tesla.com&quot;&gt;akahn@tesla.com&lt;/a&gt; Thanks for getting back to me on those questions!&lt;/p&gt;</comment>
                            <comment id="3858652" author="JIRAUSER1260059" created="Thu, 3 Jun 2021 21:19:02 +0000"  >&lt;p&gt;Matt,&lt;br/&gt;
I didn&apos;t address your questions, answers in order:&lt;br/&gt;
1. The configuration of our production application that experienced the outage is context timeout 5s.&lt;br/&gt;
2. Yes we experience timeouts, but they are rare, on the order of 1-5 per day&lt;br/&gt;
3. Atlas hosted&lt;br/&gt;
4. Yes we are using TLS connections&lt;/p&gt;</comment>
                            <comment id="3856486" author="JIRAUSER1260059" created="Thu, 3 Jun 2021 04:04:51 +0000"  >&lt;p&gt;@matt.dale Thanks for the follow up and creating additional issues.  Good suggestion on MinPoolSize, we have used that in the past, but for the app that caused the reported incident above we did not have a minimum set. &lt;br/&gt;
 We&apos;ve experienced this issue with both 20 max pool and 100 max pool.  I&apos;m on an unrelated problem until late tomorrow, and will be back to working on the driver changes on Friday.  I&apos;ll include you on the PR, and let you know if I run into any blockers with tests, etc.&lt;/p&gt;</comment>
                            <comment id="3855450" author="JIRAUSER1259527" created="Thu, 3 Jun 2021 00:35:23 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=akahn%40tesla.com&quot; class=&quot;user-hover&quot; rel=&quot;akahn@tesla.com&quot;&gt;akahn@tesla.com&lt;/a&gt; I have a few questions to help me better understand your use case:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;What are the &lt;tt&gt;Context&lt;/tt&gt; timeout durations on the operations that cause the most timeout errors? E.g. 500ms, 1s, 10s?&lt;/li&gt;
	&lt;li&gt;Do you typically encounter some MongoDB operation timeouts when your application/service is operating normally, or do you typically encounter no MongoDB timeouts unless something is abnormal?&lt;/li&gt;
	&lt;li&gt;Is your MongoDB server self-hosted or Atlas-hosted?&lt;/li&gt;
	&lt;li&gt;Are you using TLS connections?&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;As far as mitigations, I recommend setting the client &lt;tt&gt;minPoolSize&lt;/tt&gt; to a value greater than 0. Setting &lt;tt&gt;minPoolSize&lt;/tt&gt; starts a background goroutine that runs once per minute and attempts to maintain at least the configured number of connections in the pool, creating new connections until &lt;tt&gt;minPoolSize&lt;/tt&gt; is reached. The maintenance goroutine creates connections using the timeout configured with &lt;tt&gt;SetConnectTimeout&lt;/tt&gt; or the default connection timeout (30 seconds).&lt;/p&gt;

&lt;p&gt;E.g. client configuration:&lt;/p&gt;
&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;clientOpts := options.&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;	Client().&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;	ApplyURI(uri).&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;	SetMaxPoolSize(20).&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;	SetMinPoolSize(10)  // Set the min connection pool size.&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;client, err := mongo.Connect(context.Background(), clientOpts)&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;


&lt;p&gt;Additionally, I&apos;ve created two new tickets that describe driver improvements to reduce the impact of driver-side operation timeouts on the connection pool:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;&lt;a href=&quot;https://jira.mongodb.org/browse/GODRIVER-2037&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;GODRIVER-2037&lt;/a&gt; - Don&apos;t clear the connection pool on Context timeout during handshake&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://jira.mongodb.org/browse/GODRIVER-2038&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;GODRIVER-2038&lt;/a&gt; - Use &quot;ConnectionTimeout&quot; for creating all new connections and background connection creation&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;I think the changes to accomplish &lt;a href=&quot;https://jira.mongodb.org/browse/GODRIVER-2038&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;GODRIVER-2038&lt;/a&gt; may be similar to the refactor you proposed using channels and async connection creation. If you&apos;re still working on that improvement, please continue and include me as a reviewer on any PRs.&lt;/p&gt;</comment>
                            <comment id="3848580" author="JIRAUSER1259527" created="Fri, 28 May 2021 21:20:57 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=akahn%40tesla.com&quot; class=&quot;user-hover&quot; rel=&quot;akahn@tesla.com&quot;&gt;akahn@tesla.com&lt;/a&gt; I&apos;ve been able to reproduce a similar problem with many connections being closed due to operation timeouts and then new connections being created with the timeout of the operation context instead of a separate connection timeout. I&apos;m still investigating the reason behind the connections being closed in the first place and looking for any short-term mitigations, but I agree that the design of the connection pool could be significantly improved. Please tag me on any PRs you open, I&apos;d be happy to review them.&lt;/p&gt;

&lt;p&gt;Thanks!&lt;/p&gt;</comment>
                            <comment id="3848422" author="JIRAUSER1260059" created="Fri, 28 May 2021 19:49:12 +0000"  >&lt;p&gt;Matt,&lt;br/&gt;
Unfortunately, we had another incident &lt;a href=&quot;https://support.mongodb.com/case/00781659&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://support.mongodb.com/case/00781659&lt;/a&gt; caused by the driver that my proposed solution above would not have mitigated.  I believe there is a fundamental design flaw with the connection pool in that it will attempt to spin up connections using the query context.  I&apos;m taking a stab at rewriting the connection pool management using channels, and purely async connection creation.  My plan is to PR the change from the teslamotors fork back to mainline, but the change is somewhat large.  If you&apos;re up for a preliminary review of a draft, let&apos;s sync up mid next week.&lt;/p&gt;

&lt;p&gt;-Aaron&lt;/p&gt;
</comment>
                            <comment id="3846902" author="JIRAUSER1259527" created="Fri, 28 May 2021 03:08:08 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=akahn%40tesla.com&quot; class=&quot;user-hover&quot; rel=&quot;akahn@tesla.com&quot;&gt;akahn@tesla.com&lt;/a&gt; thanks for reporting this issue! We&apos;re looking into it and will let you know if we have any questions.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10320">
                    <name>Documented</name>
                                                                <inwardlinks description="is documented by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="1766838">GODRIVER-2037</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="1766841">GODRIVER-2038</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                                                                                    <customfield id="customfield_13552" key="com.go2group.jira.plugin.crm:crm_generic_field">
                        <customfieldname>Case</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[[5002K00000vd1QhQAI]]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10257" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Documentation Changes</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10250"><![CDATA[Needed]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_14266" key="com.atlassian.jira.plugin.system.customfieldtypes:textarea">
                        <customfieldname>Documentation Changes Summary</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>&lt;p&gt;The proposed solution is to make two changes, both need additional documentation:&lt;br/&gt;
1. Metrics for connection pool checkout duration (semaphore wait time)&lt;br/&gt;
2. A new configuration tracking minimum connection io duration or maximum connection checkout (semaphore wait) duration.&lt;/p&gt;</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10857" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>GODRIVER-2145</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hz2ctz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            </customfields>
    </item>
</channel>
</rss>