<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:40:31 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-56917] Stuck Hello request may lead to cluster outage</title>
                <link>https://jira.mongodb.org/browse/SERVER-56917</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;&lt;b&gt;Background&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;We had an outage in 4.0 cluster when a hardware/OS outage at one shard primary server manifested as stuck Hello request resulted to sharded cluster outage. The culprit was in RSM that was blocked in thread and accumulated unprocessed requests eventually becoming unresponsive for all shards.&lt;/p&gt;

&lt;p&gt;In order to verify similar vulnerability in other branches the new pass-through test was also ported to current head (5.0). The test shows somewhat different, and yet critical bug.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Test results&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;If the attached diff is applied, the following RSM outage is reproduced:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Fail injection is configured to delay Hello response indefinitely at primary&lt;/li&gt;
	&lt;li&gt;Primary is forced to step down and new primary is forced to step up&lt;/li&gt;
	&lt;li&gt;Mongos is unable to detect new primary by entering an infinite loop:&lt;/li&gt;
	&lt;li&gt;Hello request to old (dysfunctional) primary fails with NetworkInterfaceExceededTimeLimit&lt;/li&gt;
	&lt;li&gt;RSM starts another Hello to the same server without even trying other servers&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Apparently all our branches are vulnerable to this bug one way or another. This ticket is for 5.0 that should be backported to at least 4.4. The fix for 4.0 and 4.2 is a separate &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-56854&quot; title=&quot;Provide the ability for RSM requests to timeout and mark the server as failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-56854&quot;&gt;&lt;del&gt;SERVER-56854&lt;/del&gt;&lt;/a&gt; as the code and fix are quite different.&lt;/p&gt;</description>
                <environment></environment>
        <key id="1714670">SERVER-56917</key>
            <summary>Stuck Hello request may lead to cluster outage</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13202">Works as Designed</resolution>
                                        <assignee username="andrew.shuvalov@mongodb.com">Andrew Shuvalov</assignee>
                                    <reporter username="andrew.shuvalov@mongodb.com">Andrew Shuvalov</reporter>
                        <labels>
                            <label>sharding-product-sync</label>
                    </labels>
                <created>Thu, 13 May 2021 15:04:01 +0000</created>
                <updated>Fri, 27 Oct 2023 13:52:23 +0000</updated>
                            <resolved>Fri, 16 Jul 2021 14:40:43 +0000</resolved>
                                    <version>5.0.0</version>
                                                                        <votes>0</votes>
                                    <watches>8</watches>
                                                                                                                <comments>
                            <comment id="3947017" author="JIRAUSER1256988" created="Fri, 16 Jul 2021 14:40:43 +0000"  >&lt;p&gt;The integration test was ported to other branches and proved that the bug existed only in the 4.0 branch.&#160; In other branches, the root cause of the suspected problem was that the fail injection was preventing the test infrastructure to setup properly, which was hard to separate from the actual bug.&lt;/p&gt;</comment>
                            <comment id="3771767" author="lamont.nelson" created="Thu, 13 May 2021 15:18:46 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-56854&quot; title=&quot;Provide the ability for RSM requests to timeout and mark the server as failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-56854&quot;&gt;&lt;del&gt;SERVER-56854&lt;/del&gt;&lt;/a&gt; is the 4.0 and 4.2 version of this ticket, which is being worked on right now.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10420">
                    <name>Backports</name>
                                            <outwardlinks description="backported by">
                                                        </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="1710876">SERVER-56854</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="314775" name="hello_delay.diff" size="11152" author="andrew.shuvalov@mongodb.com" created="Thu, 13 May 2021 15:06:11 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 13 May 2021 15:18:46 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        2 years, 29 weeks, 5 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>luke.bonanomi@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            2 years, 29 weeks, 5 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>andrew.shuvalov@mongodb.com</customfieldvalue>
            <customfieldvalue>lamont.nelson@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzakrb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hyvc3r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="4521">Sharding 2021-07-12</customfieldvalue>
    <customfieldvalue id="4522">Sharding 2021-07-26</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10750" key="com.atlassian.jira.plugin.system.customfieldtypes:textarea">
                        <customfieldname>Steps To Reproduce</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>&lt;p&gt;Apply diff and run:&lt;/p&gt;

&lt;p&gt;buildscripts/resmoke.py run --suite=sharding_hello_failures jstests/concurrency/fsm_workloads/update_array.js&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hza70f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>