<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:57:08 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-41217] Potential deadlock between ShardRegistry and LSC refresh</title>
                <link>https://jira.mongodb.org/browse/SERVER-41217</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;ShardRegistry::reload() on a config server waits for majority read on a local shard. If it coincides with the LogicalSessionsCache::refresh() which performs  batch writes it may end up in the deadlock while calling &lt;a href=&quot;https://github.com/mongodb/mongo/blob/master/src/mongo/s/catalog_cache.cpp#L111&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;ShardRegistry::getShard() &lt;/a&gt;while &lt;a href=&quot;https://github.com/mongodb/mongo/blob/master/src/mongo/s/catalog_cache.cpp#L257&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;refreshing collectionRoutingInfo&lt;/a&gt; which can join the reload().&lt;br/&gt;
The related stack  traces are in the BF-12772&lt;/p&gt;


&lt;h3&gt;&lt;a name=&quot;SuggestedFix&quot;&gt;&lt;/a&gt;Suggested Fix&lt;/h3&gt;
&lt;p&gt;I propose to check in &lt;a href=&quot;https://github.com/mongodb/mongo/blob/r4.3.0/src/mongo/db/repl/replication_coordinator_impl.cpp#L1330&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;ReplicationCoordinatorImpl::waitUntilOpTimeForRead&lt;/a&gt; if secondaries are up or down. It should behave similarly to the case when &lt;a href=&quot;https://github.com/mongodb/mongo/blob/r4.3.0/src/mongo/db/repl/replication_coordinator_impl.cpp#L1404&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;_isShutdown&lt;/a&gt; flag is set.&lt;/p&gt;</description>
                <environment></environment>
        <key id="771046">SERVER-41217</key>
            <summary>Potential deadlock between ShardRegistry and LSC refresh</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13202">Works as Designed</resolution>
                                        <assignee username="backlog-server-sharding">[DO NOT USE] Backlog - Sharding Team</assignee>
                                    <reporter username="misha.tyulenev@mongodb.com">Misha Tyulenev</reporter>
                        <labels>
                            <label>sharding-wfbf-day</label>
                    </labels>
                <created>Fri, 17 May 2019 18:44:54 +0000</created>
                <updated>Fri, 27 Oct 2023 13:53:13 +0000</updated>
                            <resolved>Thu, 5 Sep 2019 13:31:55 +0000</resolved>
                                    <version>4.0.9</version>
                                                    <component>Sharding</component>
                                        <votes>0</votes>
                                    <watches>5</watches>
                                                                                                                <comments>
                            <comment id="2408758" author="kaloian.manassiev" created="Thu, 5 Sep 2019 13:31:39 +0000"  >&lt;p&gt;There is no deadlock between the LSC thread and ShardRegistry reload. All the stack traces in BFG-280106 (because the main logs from which BF-12772 was created) point to everybody involved waiting on either afterOpTime read or majority write against the config server primary. However, the config server primary has crashed with an &lt;tt&gt;invariant failure pool-&amp;gt;_checkedOutPool.empty() src/mongo/executor/connection_pool.cpp&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;The more interesting issue in BF-12772 is why the remaining 2 nodes weren&apos;t able to elect a primary, but I will continue that conversation there.&lt;/p&gt;</comment>
                            <comment id="2277688" author="misha.tyulenev" created="Mon, 10 Jun 2019 14:32:09 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=matthew.saltz&quot; class=&quot;user-hover&quot; rel=&quot;matthew.saltz&quot;&gt;matthew.saltz&lt;/a&gt; The issues are related but the scenario is not exactly the same: the BF12772 does not create the &lt;tt&gt;config.syste.sessions&lt;/tt&gt; collection. However, the hang condition is similar - waiting for the majority while secondary nodes are down. I&apos;ll look more into it to check if there is the same root cause.&lt;/p&gt;</comment>
                            <comment id="2276601" author="matthew.saltz" created="Fri, 7 Jun 2019 21:02:06 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=misha.tyulenev&quot; class=&quot;user-hover&quot; rel=&quot;misha.tyulenev&quot;&gt;misha.tyulenev&lt;/a&gt; I think this ticket may be a dupe of the one linked, but haven&apos;t checked this one to see if the symptoms are exactly the same&lt;/p&gt;</comment>
                            <comment id="2270408" author="matthew.saltz" created="Tue, 4 Jun 2019 14:27:47 +0000"  >&lt;p&gt;So is this ticket description inaccurate then?&lt;/p&gt;</comment>
                            <comment id="2269180" author="misha.tyulenev" created="Mon, 3 Jun 2019 17:43:38 +0000"  >&lt;p&gt;Good point, unless the replication is not calling getCollectionRoutingInfo it should not block.&lt;/p&gt;</comment>
                            <comment id="2269126" author="matthew.saltz" created="Mon, 3 Jun 2019 17:16:08 +0000"  >&lt;p&gt;One thing I&apos;m not quite following is: Why does the LogicalSessionCache refresh block replication?&lt;/p&gt;</comment>
                            <comment id="2269006" author="misha.tyulenev" created="Mon, 3 Jun 2019 16:30:43 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=matthew.saltz&quot; class=&quot;user-hover&quot; rel=&quot;matthew.saltz&quot;&gt;matthew.saltz&lt;/a&gt; i dont think it is a direct dup, the scenario is slighty different. However the fix fo this bug will likely fix the BF you are looking at. In the BF-12772 the following scenario happens on the node0 of the config shard&lt;br/&gt;
the LogicalSessionCache thread:&lt;br/&gt;
 calls  &lt;tt&gt;mongo::CatalogCache::_getCollectionRoutingInfoAt&lt;/tt&gt; which is scheduling &lt;tt&gt;refreshCollectionRoutingInfo&lt;/tt&gt; which calls &lt;tt&gt;ShardRegistry::getShard&lt;/tt&gt; which can join the &lt;tt&gt;ShardRegistry::reload&lt;/tt&gt;&lt;/p&gt;

&lt;p&gt;the shardRegitry refresh thread calls refresh and waits for the replication to be completed which will be completed once the write in the LogicalSessionCache refresh thread finishes.  &lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25141"><![CDATA[Sharding]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 3 Jun 2019 14:54:06 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        4 years, 22 weeks, 6 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>luke.bonanomi@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            4 years, 22 weeks, 6 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-server-sharding</customfieldvalue>
            <customfieldvalue>kaloian.manassiev@mongodb.com</customfieldvalue>
            <customfieldvalue>matthew.saltz@mongodb.com</customfieldvalue>
            <customfieldvalue>misha.tyulenev@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hv01lb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hur5qf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="3003">Sharding 2019-07-01</customfieldvalue>
    <customfieldvalue id="3061">Sharding 2019-07-15</customfieldvalue>
    <customfieldvalue id="3198">Sharding 2019-09-09</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|huznun:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>