<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:51:21 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-19550] Balancer can cause cascading mongod failures during network partitions</title>
                <link>https://jira.mongodb.org/browse/SERVER-19550</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Under the following conditions, the balancer can cause up to N mongod primary failures, where N is the number of collections currently eligible for balancing.&lt;/p&gt;

&lt;p&gt;1. The balancer tries to move a chunk&lt;br/&gt;
2. During the moveChunk, the FROM primary enters the critical section&lt;br/&gt;
3. During the critical section, the FROM primary is partitioned from the first config server&lt;br/&gt;
4. The primary will crash&lt;br/&gt;
5. If the balancing mongos is still able to reach the first config server, it will release its balancing lock&lt;br/&gt;
6. If 5 happened, goto 1&lt;/p&gt;

&lt;p&gt;After step 4, the collection lock will still be held on the collection that was being balanced at the time of the crash. This lock will expire after 24 hours, but during that time the collection is no longer eligible for balancing.&lt;/p&gt;

&lt;p&gt;Critically, this failure cannot cascade to multiple nodes if the first config server is also partitioned from the balancing mongos. When this happens, the balancing mongos is unable to release its balancing lock, so balancing cannot continue. This is a good thing, as it prevents cascading primary failures.&lt;/p&gt;

&lt;p&gt;Since this failure can only happen if the balancing mongos is still able to read and write from the config server, it seems like some more intelligent reading of the current lock state by that mongos would allow us to prevent the cascading failure. Perhaps something like this?&lt;/p&gt;

&lt;p&gt;1. Balancing mongos realizes the migration has failed&lt;br/&gt;
2. Balancing mongos checks the collection lock on the collection it was just trying to migrate&lt;br/&gt;
3. If that lock is still held, mongos can reason that something has gone wrong and refuse to relinquish its balancing lock as a precaution&lt;/p&gt;

&lt;p&gt;If the balancing mongos did this, it would prevent the cascading failure and give us 24 hours to respond to the issue before any other primaries could die.&lt;/p&gt;</description>
                <environment>Repro&amp;#39;d with Mongo 3.0.4 WT inside Docker and TCL 6.3</environment>
        <key id="222373">SERVER-19550</key>
            <summary>Balancer can cause cascading mongod failures during network partitions</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="backlog-server-sharding">[DO NOT USE] Backlog - Sharding Team</assignee>
                                    <reporter username="travis@gamechanger.io">Travis Thieman</reporter>
                        <labels>
                            <label>ShardingRoughEdges</label>
                    </labels>
                <created>Thu, 23 Jul 2015 15:36:07 +0000</created>
                <updated>Tue, 6 Dec 2022 04:47:41 +0000</updated>
                            <resolved>Mon, 9 Dec 2019 15:23:48 +0000</resolved>
                                    <version>3.0.4</version>
                                                    <component>Sharding</component>
                                        <votes>0</votes>
                                    <watches>10</watches>
                                                                                                                <comments>
                            <comment id="2593770" author="sheeri.cabral" created="Mon, 9 Dec 2019 15:23:48 +0000"  >&lt;p&gt;gone away since the balancer has moved to the config servers.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="83184" name="mongo-cluster.yml" size="892" author="travis@gamechanger.io" created="Thu, 23 Jul 2015 15:36:07 +0000"/>
                            <attachment id="83185" name="test.sh" size="3474" author="travis@gamechanger.io" created="Thu, 23 Jul 2015 15:36:07 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25141"><![CDATA[Sharding]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 23 Jul 2015 15:55:44 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        4 years, 9 weeks, 2 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>alexander.golin@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            4 years, 9 weeks, 2 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-server-sharding</customfieldvalue>
            <customfieldvalue>sheeri.cabral</customfieldvalue>
            <customfieldvalue>travis@gamechanger.io</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrkzpz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hrforr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10750" key="com.atlassian.jira.plugin.system.customfieldtypes:textarea">
                        <customfieldname>Steps To Reproduce</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>&lt;p&gt;Repro can be achieved using the Docker Compose spec and script attached to this ticket.&lt;/p&gt;

&lt;p&gt;Public Gist of those same files is available here: &lt;a href=&quot;https://gist.github.com/thieman/e28d426d3415c30caf38&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://gist.github.com/thieman/e28d426d3415c30caf38&lt;/a&gt;&lt;/p&gt;</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hsftxr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>