<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Wed Feb 07 21:21:12 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[CDRIVER-4532] Add retry behavior for handshake failures in mongoc_cluster_stream_for_server</title>
                <link>https://jira.mongodb.org/browse/CDRIVER-4532</link>
                <project id="10030" key="CDRIVER">C Driver</project>
                    <description>&lt;p&gt;Quoting &lt;a href=&quot;https://github.com/mongodb/mongo-c-driver/pull/1141&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;mongodb/mongo-c-driver#1141&lt;/a&gt; for &lt;a href=&quot;https://jira.mongodb.org/browse/CDRIVER-4192&quot; title=&quot;Drivers should retry operations if connection handshake fails&quot; class=&quot;issue-link&quot; data-issue-key=&quot;CDRIVER-4192&quot;&gt;&lt;del&gt;CDRIVER-4192&lt;/del&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;Note, &lt;tt&gt;mongoc_cluster_stream_for_server&lt;/tt&gt;, which is used for other miscellaneous connections to the server, is deliberately not involved in these changes. This function may eventually require similar changes according to &lt;a href=&quot;https://jira.mongodb.org/browse/DRIVERS-2063&quot; title=&quot;Handle write errors differently depending on whether the outcome is known&quot; class=&quot;issue-link&quot; data-issue-key=&quot;DRIVERS-2063&quot;&gt;DRIVERS-2063&lt;/a&gt;.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/browse/CDRIVER-4192&quot; title=&quot;Drivers should retry operations if connection handshake fails&quot; class=&quot;issue-link&quot; data-issue-key=&quot;CDRIVER-4192&quot;&gt;&lt;del&gt;CDRIVER-4192&lt;/del&gt;&lt;/a&gt; implemented retry logic for handshake failure to all &lt;tt&gt;mongoc_cluster_stream_for_*&lt;/tt&gt; methods &lt;em&gt;except&lt;/em&gt; &lt;tt&gt;mongoc_cluster_stream_for_server&lt;/tt&gt;. The PHP driver exclusively uses &lt;tt&gt;mongoc_cluster_stream_for_server&lt;/tt&gt; for executing operations, as it performs its own server selection and then specifies a &lt;tt&gt;server_id&lt;/tt&gt; option to libmongoc&apos;s execute methods.&lt;/p&gt;</description>
                <environment></environment>
        <key id="2201848">CDRIVER-4532</key>
            <summary>Add retry behavior for handshake failures in mongoc_cluster_stream_for_server</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="10300" iconUrl="https://jira.mongodb.org/images/icons/priorities/medium.svg">Unknown</priority>
                        <status id="10038" iconUrl="https://jira.mongodb.org/images/icons/subtask.gif" description="">Backlog</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="-1">Unassigned</assignee>
                                    <reporter username="jmikola@mongodb.com">Jeremy Mikola</reporter>
                        <labels>
                    </labels>
                <created>Mon, 5 Dec 2022 09:35:38 +0000</created>
                <updated>Tue, 16 Jan 2024 15:57:34 +0000</updated>
                                                                                                <votes>0</votes>
                                    <watches>4</watches>
                                                                                                                <comments>
                            <comment id="5106815" author="JIRAUSER1261413" created="Wed, 11 Jan 2023 18:33:25 +0000"  >&lt;p&gt;Moving to backlog, pending further updates in &lt;a href=&quot;https://jira.mongodb.org/browse/DRIVERS-746&quot; title=&quot;Drivers should retry operations if connection handshake fails&quot; class=&quot;issue-link&quot; data-issue-key=&quot;DRIVERS-746&quot;&gt;DRIVERS-746&lt;/a&gt; or related tickets regarding concrete actions to take regarding this issue.&lt;/p&gt;</comment>
                            <comment id="5048751" author="kevin.albertson" created="Mon, 12 Dec 2022 19:30:53 +0000"  >&lt;blockquote&gt;&lt;p&gt;My read of the spec also suggests that handshakes should never be retried for aggregate with $out/$merge, since neither spec considers those to be retryable. If you agree, I think this warrants a separate CDRIVER ticket to change the logic for whether handshakes should be retried to also consider the operation itself (more important for writes IMO, as I think all read ops are retryable).&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Given the comment on &lt;a href=&quot;https://jira.mongodb.org/browse/DRIVERS-746&quot; title=&quot;Drivers should retry operations if connection handshake fails&quot; class=&quot;issue-link&quot; data-issue-key=&quot;DRIVERS-746&quot;&gt;DRIVERS-746&lt;/a&gt;, I assume this is no longer a concern.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Given that, I think it makes sense to change the execution methods I referenced above (and potentially others) to use a stream constructor that conditionally allows handshake retries and then use _stream_for_server (i.e. never retries handshakes) internally when you need to create a new stream for retrying an operation.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;The handshake retry may be invalidated by DRIVERS-555 or DRIVERS-1262, so I prefer not to add much additional complexity. But the handshake retry may still be a benefit to users until DRIVERS-555 or DRIVERS-1262 are addressed. This seems reasonable solution in the meantime. I.e. if a serverId option is passed, enable handshake retry in&#160;&lt;tt&gt;mongoc_cluster_stream_for_server&lt;/tt&gt;.&lt;/p&gt;</comment>
                            <comment id="5042824" author="jmikola@gmail.com" created="Fri, 9 Dec 2022 04:04:35 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=ezra.chung%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;ezra.chung@mongodb.com&quot;&gt;ezra.chung@mongodb.com&lt;/a&gt;: See my &lt;a href=&quot;https://jira.mongodb.org/browse/DRIVERS-746?focusedCommentId=5042820&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-5042820&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;comment in DRIVERS-746&lt;/a&gt;. In paritcular:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;I think the original libmongoc implementation (&lt;a href=&quot;https://jira.mongodb.org/browse/CDRIVER-4192&quot; title=&quot;Drivers should retry operations if connection handshake fails&quot; class=&quot;issue-link&quot; data-issue-key=&quot;CDRIVER-4192&quot;&gt;&lt;del&gt;CDRIVER-4192&lt;/del&gt;&lt;/a&gt;) inadvertently introduced logic to retry handshakes for all operations (handshake retry eligibility is determined by the URI options alone and doesn&apos;t consider the operation itself).&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;This would have been caught by spec tests in &lt;a href=&quot;https://jira.mongodb.org/browse/DRIVERS-2247&quot; title=&quot;Add tests for non-retryable handshake errors&quot; class=&quot;issue-link&quot; data-issue-key=&quot;DRIVERS-2247&quot;&gt;DRIVERS-2247&lt;/a&gt;, which don&apos;t yet exist. My read of the spec also suggests that handshakes should &lt;em&gt;never&lt;/em&gt; be retried for aggregate with $out/$merge, since neither spec considers those to be retryable. If you agree, I think this warrants a separate CDRIVER ticket to change the logic for whether handshakes should be retried to &lt;em&gt;also&lt;/em&gt; consider the operation itself (more important for writes IMO, as I think all read ops are retryable).&lt;/p&gt;</comment>
                            <comment id="5042783" author="jmikola@gmail.com" created="Fri, 9 Dec 2022 03:43:17 +0000"  >&lt;blockquote&gt;&lt;p&gt;I am unable to discern what line you are referring to with the given link, but the link to the original spec change is indeed why the retryableWriteError label is conditionally applied.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Looks like the diff link doesn&apos;t work because GitHub defaults to hiding &lt;tt&gt;mongoc-cluster.c&lt;/tt&gt;. I was referring to &lt;a href=&quot;https://github.com/mongodb/mongo-c-driver/blob/30fdc1c12dc7e4c801649eb2e5e4c245814f88da/src/libmongoc/src/mongoc/mongoc-cluster.c#L2882&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;this portion of &lt;tt&gt;_mongoc_cluster_stream_for_optype&lt;/tt&gt;&lt;/a&gt; where you add a label iff the operation type is not &lt;tt&gt;MONGOC_SS_READ&lt;/tt&gt;.&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;I was not aware of the &quot;only add the label to an error when the client has added a txnNumber to the command&quot; description in the Q&amp;amp;A section.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I&apos;m not sure how relevant this is, and there is evidently no test coverage for this. I would have expected there to be a test using a non-retryable write (e.g. aggregation with &lt;tt&gt;$out&lt;/tt&gt;) and expecting no error label to be added; however, there don&apos;t appear to be &lt;em&gt;any&lt;/em&gt; handshakeError tests that expect operations to fail &amp;#8211; everything is &quot;&amp;lt;operation&amp;gt; succeeds after retryable handshake server error&quot;.&lt;/p&gt;

&lt;p&gt;Looking back at the &lt;a href=&quot;https://github.com/mongodb/specifications/commit/082306075eb2d125c47503e94fabc42f00a16784&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;original commit for DRIVERS-746&lt;/a&gt;, I think the spec contradicts itself. The newly introduced paragraph talks about adding the label based on the &lt;tt&gt;retryWrites&lt;/tt&gt; URI option, but the Q&amp;amp;A section talks about only doing this for retryable writes (with a &lt;tt&gt;txnNumber&lt;/tt&gt; field). I think it&apos;s just a coincidence that the commit happened to modify a line in that Q&amp;amp;A entry (just whitespace). I&apos;ll follow up on &lt;a href=&quot;https://jira.mongodb.org/browse/DRIVERS-746&quot; title=&quot;Drivers should retry operations if connection handshake fails&quot; class=&quot;issue-link&quot; data-issue-key=&quot;DRIVERS-746&quot;&gt;DRIVERS-746&lt;/a&gt; to ask for clarification on this (and potentially open a new ticket if that seems prudent).&lt;/p&gt;

&lt;p&gt;As for tests to assert that the handshake&apos;s retry attempt prevents the subsequent operation from retrying, I suppose that would not apply in drivers that implement CSOT. Likewise, there&apos;d be no way to assert that the handshake failure adds a label (or does not) since CSOT would ultimately allow the subsequent operation to succeed.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;I think I misunderstood the purpose of &lt;tt&gt;mongoc_cluster_stream_for_aggr_with_write&lt;/tt&gt;. This is necessary because it&apos;s a write operation that also takes a read preference, correct?&lt;/p&gt;

&lt;p&gt;Regarding &lt;tt&gt;MONGOC_SS_AGGREGATE_WITH_WRITE&lt;/tt&gt;: looking back at the logic for adding a RetryableWriteError label, it looks like that might also apply to aggregation writes. That may be a bug, although quite minor as I don&apos;t think the label has much impact other than confusing the user.&lt;/p&gt;

&lt;p&gt;I&apos;m unsure about &lt;tt&gt;mongoc_cluster_stream_for_aggr_with_write&lt;/tt&gt; consulting the &lt;tt&gt;retryWrites&lt;/tt&gt; URI option for initialization of &lt;tt&gt;is_retryable&lt;/tt&gt;, which is later used to determine whether handshakes can be retried. AFAICT, the specs don&apos;t really provide any guidance on what to do there. Aggregations are not discussed at all in the retryable writes spec and pipelines containing &lt;tt&gt;$out&lt;/tt&gt; and &lt;tt&gt;$merge&lt;/tt&gt; are explicitly designated as &lt;em&gt;not&lt;/em&gt; retryable in the reads spec.&lt;/p&gt;

&lt;p&gt;I think this is something we could clarify when following up on &lt;a href=&quot;https://jira.mongodb.org/browse/DRIVERS-746&quot; title=&quot;Drivers should retry operations if connection handshake fails&quot; class=&quot;issue-link&quot; data-issue-key=&quot;DRIVERS-746&quot;&gt;DRIVERS-746&lt;/a&gt;, since it&apos;s the same question of how do we handle handshake retries for operations that are otherwise not retryable (i.e. writes with no &lt;tt&gt;txnNumber&lt;/tt&gt;).&lt;/p&gt;

&lt;p&gt;It seems unfortunate that handshake retries are being conflated with retryable reads/writes (echoing Matt&apos;s sentiment in &lt;a href=&quot;https://jira.mongodb.org/browse/DRIVERS-746?focusedCommentId=4428138&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-4428138&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;this comment&lt;/a&gt;). Talking this through, I now understand why Daria highlighted &lt;a href=&quot;https://jira.mongodb.org/browse/DRIVERS-2247&quot; title=&quot;Add tests for non-retryable handshake errors&quot; class=&quot;issue-link&quot; data-issue-key=&quot;DRIVERS-2247&quot;&gt;DRIVERS-2247&lt;/a&gt; (&lt;a href=&quot;https://jira.mongodb.org/browse/DRIVERS-746?focusedCommentId=4507855&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-4507855&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;comment&lt;/a&gt;) and Patrick referred to DRIVERS-1262 (&lt;a href=&quot;https://jira.mongodb.org/browse/DRIVERS-746?focusedCommentId=4430612&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-4430612&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;comment&lt;/a&gt;).&lt;/p&gt;

&lt;hr /&gt;

&lt;blockquote&gt;&lt;p&gt;my interpretation of the Retryable Reads/Writes specification suggested that only the first connection to a server during server selection via _stream_for_reads or _stream_for_writes was eligible for retryable handshakes according to the rule that &quot;a single retryable handshake error makes any following operations ineligible for retryability&quot;. Therefore, _stream_for_server was omitted from retryable handshakes as the retryable handshakes (appeared to) always be handled by the first connection attempt via _stream_for_reads and _stream_for_writes. My interpretation on this may have been incorrect given PHP&apos;s use of _stream_for_server. Thoughts?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;This sounds like something the spec is ambiguous about, and potentially may never be addressed now that the spec has been modified to consider CSOT and all previous mentions of &quot;retry-once&quot; behavior removed.&lt;/p&gt;

&lt;p&gt;That said, I think your understanding is correct given the following:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;If the initial stream creation encounters a handshake error and retries, then there will be no subsequent attempts to retry for the write operation. I think this is handled by you currently tracking whether a retry occurred on the stream struct.&lt;/li&gt;
	&lt;li&gt;If initial stream creation encountered &lt;em&gt;no&lt;/em&gt; error and the operation itself fails, we return to create a new stream. In this case I think it also makes sense to prohibit retrying on a handshake error because the operation failure is consuming our one allowable retry.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Given that, I think it makes sense to change the execution methods I referenced above (and potentially others) to use a stream constructor that conditionally allows handshake retries and then use &lt;tt&gt;_stream_for_server&lt;/tt&gt; (i.e. never retries handshakes) internally when you need to create a new stream for retrying an operation.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                                                <inwardlinks description="is depended on by">
                                        <issuelink>
            <issuekey id="1902491">PHPLIB-1042</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="2172063">PHPLIB-1033</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="1902485">CDRIVER-4192</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_21553" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Quarter</customfieldname>
                        <customfieldvalues>
                                        <label>FY25Q1</label>
            <label>FY25Q2</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|i13uaw:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            </customfields>
    </item>
</channel>
</rss>