<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:36:50 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-55573] Deadlock between stepdown and chunk migration</title>
                <link>https://jira.mongodb.org/browse/SERVER-55573</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;When this deadlock occurs, the MigrationDestinationManager is holding the session checked out in what it calls &quot;outerOpCtx&quot;. It then dispatches other threads with other opCtxs to do work on its behalf (in _migrateDriver()). Those opCtxs will not be killed by killSessions, because they do not have the session checked out. So what happens is&lt;/p&gt;

&lt;p&gt;outerOpCtx holds session, but is not being used otherwise. In fact, it&apos;s not on a thread because an AlternativeClientRegion has been used.&lt;/p&gt;

&lt;p&gt;Stepdown kills all user operations and all system operations marked to be killable on stepdown.&lt;/p&gt;

&lt;p&gt;_migrateDriver() (either cloneDocuments or _applyMigrateOp) creates a new operation&lt;/p&gt;

&lt;p&gt;Stepdown kills all sessions. But now we&apos;re stuck &#8211; the outerOpCtx doesn&apos;t receive the kill because it&apos;s swapped out of its thread. The new operation doesn&apos;t receive the kill because it&apos;s not associated with the session. The new operation gets stuck waiting for the RSTL, the stepdown thread gets stuck waiting for the session to be checked in, and we&apos;ve got deadlock.&lt;/p&gt;

&lt;p&gt;I can see a few ways to fix this.  One way would be to officially allow opCtxs to do work on behalf of a session they didn&apos;t have checked out; they would then get kills delivered to them (and assigning an opCtx to an already-killed session would auto-kill it).  The accounting might get ugly.  We could also do something like PrimaryOnlyService does, which is basically the same only &quot;manually&quot; &amp;#8211; register each opCtx created during migration somewhere.  Then the outerOpCtx, instead of being swapped out, is waiting for a kill.  When it gets it, it kills all registered opCtxs.&lt;/p&gt;

&lt;p&gt;Or we could have the kill loop in shutdown time out if a session isn&apos;t killed in time, and loop back and kill the operations again.  This is unelegant and runs the risk of livelock though.&lt;/p&gt;</description>
                <environment></environment>
        <key id="1660711">SERVER-55573</key>
            <summary>Deadlock between stepdown and chunk migration</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13201">Fixed</resolution>
                                        <assignee username="pierlauro.sciarelli@mongodb.com">Pierlauro Sciarelli</assignee>
                                    <reporter username="matthew.russotto@mongodb.com">Matthew Russotto</reporter>
                        <labels>
                            <label>sharding-wfbf-sprint</label>
                    </labels>
                <created>Fri, 26 Mar 2021 20:37:11 +0000</created>
                <updated>Sun, 29 Oct 2023 21:55:41 +0000</updated>
                            <resolved>Tue, 25 May 2021 11:26:07 +0000</resolved>
                                                    <fixVersion>4.4.7</fixVersion>
                    <fixVersion>5.0.0-rc1</fixVersion>
                    <fixVersion>5.1.0-rc0</fixVersion>
                                    <component>Replication</component>
                    <component>Sharding</component>
                                        <votes>0</votes>
                                    <watches>13</watches>
                                                                                                                <comments>
                            <comment id="3893999" author="xgen-internal-githook" created="Wed, 23 Jun 2021 14:27:49 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;Cheahuychou Mao&apos;, &apos;email&apos;: &apos;mao.cheahuychou@gmail.com&apos;, &apos;username&apos;: &apos;cheahuychou&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-55573&quot; title=&quot;Deadlock between stepdown and chunk migration&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-55573&quot;&gt;&lt;del&gt;SERVER-55573&lt;/del&gt;&lt;/a&gt; Deadlock between stepdown and chunk migration&lt;br/&gt;
Branch: v4.4&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/fbdfaa2530248b18b4327527f08d83eb283f67a2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/fbdfaa2530248b18b4327527f08d83eb283f67a2&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="3850578" author="xgen-internal-githook" created="Tue, 1 Jun 2021 09:46:28 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;Pierlauro Sciarelli&apos;, &apos;email&apos;: &apos;pierlauro.sciarelli@mongodb.com&apos;, &apos;username&apos;: &apos;pierlauro&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-55573&quot; title=&quot;Deadlock between stepdown and chunk migration&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-55573&quot;&gt;&lt;del&gt;SERVER-55573&lt;/del&gt;&lt;/a&gt; Deadlock between stepdown and chunk migration (BACKPORT-9251)&lt;br/&gt;
Branch: v5.0&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/6b50fd7d4bbdf4bb4c9a8dde055aa531f0780191&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/6b50fd7d4bbdf4bb4c9a8dde055aa531f0780191&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="3836104" author="xgen-internal-githook" created="Tue, 25 May 2021 11:16:13 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;Pierlauro Sciarelli&apos;, &apos;email&apos;: &apos;pierlauro.sciarelli@mongodb.com&apos;, &apos;username&apos;: &apos;pierlauro&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-55573&quot; title=&quot;Deadlock between stepdown and chunk migration&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-55573&quot;&gt;&lt;del&gt;SERVER-55573&lt;/del&gt;&lt;/a&gt; Deadlock between stepdown and chunk migration&lt;br/&gt;
Branch: master&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/284ecabb7ec2d82cfc0f4b31090df4cfeb4c99b6&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/284ecabb7ec2d82cfc0f4b31090df4cfeb4c99b6&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="3700781" author="esha.maharishi@10gen.com" created="Mon, 5 Apr 2021 14:52:08 +0000"  >&lt;p&gt;This was a good find, thank you &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=matthew.russotto&quot; class=&quot;user-hover&quot; rel=&quot;matthew.russotto&quot;&gt;matthew.russotto&lt;/a&gt;. I wanted to mention that shortly after releasing 4.4, we fixed two similar bugs (&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-48689&quot; title=&quot;MigrationDestinationManager waits for thread to join with session checked out&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-48689&quot;&gt;&lt;del&gt;SERVER-48689&lt;/del&gt;&lt;/a&gt;, &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-48641&quot; title=&quot;Deadlock due to the MigrationDestinationManager waiting for write concern with the session checked-out&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-48641&quot;&gt;&lt;del&gt;SERVER-48641&lt;/del&gt;&lt;/a&gt;) by making the _migrateDriver thread &lt;a href=&quot;https://github.com/mongodb/mongo/blob/ea51edf33aa685e8b8d4692ee42b8c0e8e9cfb98/src/mongo/db/s/migration_destination_manager.cpp#L1092&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;check in the session on &quot;outerOpCtx&quot;&lt;/a&gt; in the two places we had found that the _migrateDriver thread blocks on other threads.&lt;/p&gt;

&lt;p&gt;We missed the place where _migrateDriver &lt;a href=&quot;https://github.com/mongodb/mongo/blob/ea51edf33aa685e8b8d4692ee42b8c0e8e9cfb98/src/mongo/db/s/migration_destination_manager.cpp#L494&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;waits for the inserterThread to join&lt;/a&gt;. The same fix (having the _migrateDriver thread check in the session on &quot;outerOpCtx&quot; while blocking) will probably work, but it&apos;s a brittle solution because every such hole needs to be plugged individually. It may be worth finding a better solution.&lt;/p&gt;

&lt;p&gt;CC &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=jack.mulrow&quot; class=&quot;user-hover&quot; rel=&quot;jack.mulrow&quot;&gt;jack.mulrow&lt;/a&gt;, since we worked on the current solution together.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10420">
                    <name>Backports</name>
                                            <outwardlinks description="backported by">
                                                        </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                                                <inwardlinks description="is depended on by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="1782552">SERVER-57709</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="1786976">SERVER-57756</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="1881402">SERVER-60161</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_12450" key="com.atlassian.jira.plugin.system.customfieldtypes:multicheckboxes">
                        <customfieldname>Backport Requested</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="21777"><![CDATA[v5.0]]></customfieldvalue>
    <customfieldvalue key="18953"><![CDATA[v4.4]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10011" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Backwards Compatibility</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10038"><![CDATA[Fully Compatible]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 30 Mar 2021 06:49:02 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        2 years, 33 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_17050" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Downstream Team Attention</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="16941"><![CDATA[Not Needed]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>luke.bonanomi@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            2 years, 33 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_16465" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Linked BF Score</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>124.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>esha.maharishi@mongodb.com</customfieldvalue>
            <customfieldvalue>xgen-internal-githook</customfieldvalue>
            <customfieldvalue>matthew.russotto@mongodb.com</customfieldvalue>
            <customfieldvalue>pierlauro.sciarelli@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hz1hlz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hyv0rz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hz13v3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>