<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:48:37 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-59965] Distributed deadlock between renameCollection and multi-shard transaction</title>
                <link>https://jira.mongodb.org/browse/SERVER-59965</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;As part of a sharded renameCollection, the DDLCoordinator &lt;a href=&quot;https://github.com/mongodb/mongo/blob/a444c63c396129836c6a5f230a465e9cc651e921/src/mongo/db/s/rename_collection_coordinator.cpp#L265-L297&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;instructs all participant shards&lt;/a&gt; to &lt;a href=&quot;https://github.com/mongodb/mongo/blob/a444c63c396129836c6a5f230a465e9cc651e921/src/mongo/db/s/rename_collection_participant_service.cpp#L250-L251&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;enter their critical sections&lt;/a&gt;. When all shards have entered it, the coordinator will do some work on the configsvr and finally it will tell the shards to leave their critical section.&lt;/p&gt;

&lt;p&gt;When running renameCollection concurrently with multi-shard transactions that affect that same collection, there exists a particular interleaving that can lead to a distributed deadlock:&lt;br/&gt;
1. shard0 receives the RenameCollectionParticipant command and &lt;a href=&quot;https://github.com/mongodb/mongo/blob/a444c63c396129836c6a5f230a465e9cc651e921/src/mongo/db/s/rename_collection_participant_service.cpp#L250-L251&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;enters its critical section&lt;/a&gt;&lt;br/&gt;
2. shard0 attempts to run an statement of the multi-shard txn. Since the critical section is taken, it will &lt;a href=&quot;https://github.com/mongodb/mongo/blob/a444c63c396129836c6a5f230a465e9cc651e921/src/mongo/db/s/collection_sharding_runtime.cpp#L343-L349&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;throw StaleConfig&lt;/a&gt;. This error will &lt;a href=&quot;https://github.com/mongodb/mongo/blob/master/src/mongo/db/service_entry_point_common.cpp#L1598-L1612&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;be caught on the way out of the command&lt;/a&gt; and it will &lt;a href=&quot;https://github.com/mongodb/mongo/blob/a444c63c396129836c6a5f230a465e9cc651e921/src/mongo/db/service_entry_point_common.cpp#L1605&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;attempt to refresh the shardVersion&lt;/a&gt;. However, since the critical section is taken, the refresh will &lt;a href=&quot;https://github.com/mongodb/mongo/blob/a444c63c396129836c6a5f230a465e9cc651e921/src/mongo/db/s/shard_filtering_metadata_refresh.cpp#L226&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;block until the critical section is released&lt;/a&gt;.&lt;br/&gt;
3. shard1 runs it&apos;s part of that multi-shard transaction, which will acquire the collection lock in MODE_IX, and then stash the locks.&lt;br/&gt;
4. shard1 receives the RenameCollectionParticipant and attempts to &lt;a href=&quot;https://github.com/mongodb/mongo/blob/a444c63c396129836c6a5f230a465e9cc651e921/src/mongo/db/s/rename_collection_participant_service.cpp#L250-L251&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;enter the critical section&lt;/a&gt;. However, since the transaction at point 3 had stashed the collection lock, we are not able to &lt;a href=&quot;https://github.com/mongodb/mongo/blob/a444c63c396129836c6a5f230a465e9cc651e921/src/mongo/db/s/recoverable_critical_section_service.cpp#L85&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;acquire the collection lock in MODE_S&lt;/a&gt; needed to enter the critical section.&lt;/p&gt;

&lt;p&gt;At this point we are deadlocked:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;shard0 is holding the critical section and won&apos;t release until shard1 acquires theirs.&lt;/li&gt;
	&lt;li&gt;shard1 Is holding the collection lock in MODE_IX until the txn gets committed, which won&apos;t happen because the txn (or perhaps, rather the refresh) is not making progress on shard0 due to the critical section.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;More generally, I believe this situation can occur in any DDL operation that needs to acquire the critical section in several nodes at the same time. I believe that resharding may also be affected by this.&lt;/p&gt;
</description>
                <environment></environment>
        <key id="1875034">SERVER-59965</key>
            <summary>Distributed deadlock between renameCollection and multi-shard transaction</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13201">Fixed</resolution>
                                        <assignee username="jordi.serra-torrens@mongodb.com">Jordi Serra Torrens</assignee>
                                    <reporter username="jordi.serra-torrens@mongodb.com">Jordi Serra Torrens</reporter>
                        <labels>
                    </labels>
                <created>Wed, 15 Sep 2021 15:11:10 +0000</created>
                <updated>Sun, 29 Oct 2023 21:48:36 +0000</updated>
                            <resolved>Mon, 25 Oct 2021 07:59:24 +0000</resolved>
                                                    <fixVersion>5.2.0</fixVersion>
                    <fixVersion>5.0.4</fixVersion>
                    <fixVersion>5.1.0-rc3</fixVersion>
                                    <component>Sharding</component>
                                        <votes>0</votes>
                                    <watches>5</watches>
                                                                                                                <comments>
                            <comment id="4155088" author="xgen-internal-githook" created="Thu, 28 Oct 2021 17:09:00 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;Jordi Serra Torrens&apos;, &apos;email&apos;: &apos;jordi.serra-torrens@mongodb.com&apos;, &apos;username&apos;: &apos;jordist&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-59965&quot; title=&quot;Distributed deadlock between renameCollection and multi-shard transaction&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-59965&quot;&gt;&lt;del&gt;SERVER-59965&lt;/del&gt;&lt;/a&gt; Limit max time wait behind critical section during filtering metadata refresh in txn&lt;/p&gt;

&lt;p&gt;(cherry picked from commit 02add56a2100bef135281938a0cadaf374279f03)&lt;br/&gt;
Branch: v5.0&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/2fe5ed35b58f3f879cdf4200133102a9ae18d9ca&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/2fe5ed35b58f3f879cdf4200133102a9ae18d9ca&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="4154936" author="xgen-internal-githook" created="Thu, 28 Oct 2021 16:18:14 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;Jordi Serra Torrens&apos;, &apos;email&apos;: &apos;jordi.serra-torrens@mongodb.com&apos;, &apos;username&apos;: &apos;jordist&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-59965&quot; title=&quot;Distributed deadlock between renameCollection and multi-shard transaction&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-59965&quot;&gt;&lt;del&gt;SERVER-59965&lt;/del&gt;&lt;/a&gt; Limit max time wait behind critical section during filtering metadata refresh in txn&lt;/p&gt;

&lt;p&gt;(cherry picked from commit 02add56a2100bef135281938a0cadaf374279f03)&lt;br/&gt;
Branch: v5.1&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/fe4cbeb6d0fa079e80b1a300cd4ec8a56cffdd77&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/fe4cbeb6d0fa079e80b1a300cd4ec8a56cffdd77&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="4144591" author="xgen-internal-githook" created="Mon, 25 Oct 2021 07:37:18 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;Jordi Serra Torrens&apos;, &apos;email&apos;: &apos;jordi.serra-torrens@mongodb.com&apos;, &apos;username&apos;: &apos;jordist&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-59965&quot; title=&quot;Distributed deadlock between renameCollection and multi-shard transaction&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-59965&quot;&gt;&lt;del&gt;SERVER-59965&lt;/del&gt;&lt;/a&gt; Limit max time wait behind critical section during filtering metadata refresh in txn&lt;br/&gt;
Branch: master&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/02add56a2100bef135281938a0cadaf374279f03&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/02add56a2100bef135281938a0cadaf374279f03&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="4065453" author="JIRAUSER1257318" created="Thu, 16 Sep 2021 13:05:53 +0000"  >&lt;p&gt;Proposal is to solve the deadlock by skipping &lt;a href=&quot;https://github.com/mongodb/mongo/blob/2a363cc12f1a0b7deb23d51efd9efea2ccf79549/src/mongo/db/service_entry_point_common.cpp#L1605&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;this refresh&lt;/a&gt; (which blocks behind the critical section) in case we are in a transaction and the critical section is taken. The StaleConfig error will be propagated to the client with a TransientTransactionError label, so it will be retried.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10420">
                    <name>Backports</name>
                                            <outwardlinks description="backported by">
                                                        </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                                                <inwardlinks description="is depended on by">
                                        <issuelink>
            <issuekey id="1837663">SERVER-58991</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10320">
                    <name>Documented</name>
                                                                <inwardlinks description="is documented by">
                                        <issuelink>
            <issuekey id="1908080">DOCS-14892</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="1913006">DOCS-14907</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="335148" name="0001-SERVER-59965-repro.patch" size="8478" author="jordi.serra-torrens@mongodb.com" created="Wed, 15 Sep 2021 15:13:04 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_12450" key="com.atlassian.jira.plugin.system.customfieldtypes:multicheckboxes">
                        <customfieldname>Backport Requested</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="22495"><![CDATA[v5.1]]></customfieldvalue>
    <customfieldvalue key="21777"><![CDATA[v5.0]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10011" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Backwards Compatibility</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10038"><![CDATA[Fully Compatible]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 16 Sep 2021 13:53:54 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        2 years, 14 weeks, 6 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_17052" key="com.atlassian.jira.plugin.system.customfieldtypes:textarea">
                        <customfieldname>Downstream Changes Summary</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Added new &amp;#39;metadataRefreshInTransactionMaxWaitBehindCritSecMS&amp;#39; server parameter.</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_17050" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Downstream Team Attention</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="16942"><![CDATA[Needed]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>luke.bonanomi@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            2 years, 14 weeks, 6 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>xgen-internal-githook</customfieldvalue>
            <customfieldvalue>jordi.serra-torrens@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01rlr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hzgtzj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                            <customfield id="customfield_22250" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Special Downgrade Instructions Required</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="23343"><![CDATA[Not Needed]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="5181">Sharding EMEA 2021-09-20</customfieldvalue>
    <customfieldvalue id="5304">Sharding EMEA 2021-10-04</customfieldvalue>
    <customfieldvalue id="5425">Sharding EMEA 2021-10-18</customfieldvalue>
    <customfieldvalue id="5426">Sharding EMEA 2021-11-01</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10750" key="com.atlassian.jira.plugin.system.customfieldtypes:textarea">
                        <customfieldname>Steps To Reproduce</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>&lt;p&gt; &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/attachment/335148/335148_0001-SERVER-59965-repro.patch&quot; title=&quot;0001-SERVER-59965-repro.patch attached to SERVER-59965&quot;&gt;0001-SERVER-59965-repro.patch&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.mongodb.org/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt; &lt;/p&gt;</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                            <customfield id="customfield_17051" key="com.atlassian.jira.plugin.system.customfieldtypes:multicheckboxes">
                        <customfieldname>Teams Impacted</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="16943"><![CDATA[Cloud]]></customfieldvalue>
    <customfieldvalue key="16944"><![CDATA[Docs]]></customfieldvalue>
    <customfieldvalue key="16946"><![CDATA[Triage and Release]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01dr3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>