<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:19:12 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-28840] replSetSyncFrom causes InitialSyncer and ReplicationCoordinator to acquire each other&apos;s mutexes in opposite orders</title>
                <link>https://jira.mongodb.org/browse/SERVER-28840</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;This issue was originally discovered by the Coverity Static Analysis tool.&lt;/p&gt;

&lt;p&gt;Consider the following lock acquisitions in &lt;tt&gt;InitialSyncer&lt;/tt&gt; and &lt;tt&gt;ReplicationCoordinatorImpl&lt;/tt&gt;:&lt;/p&gt;

&lt;h5&gt;&lt;a name=&quot;%7B%7BReplicationCoordinatorImpl%3A%3AprocessReplSetSy...&quot;&gt;&lt;/a&gt;&lt;tt&gt;ReplicationCoordinatorImpl::processReplSetSyncFrom&lt;/tt&gt;&lt;/h5&gt;
&lt;ol&gt;
	&lt;li&gt;Acquire &lt;tt&gt;ReplicationCoordinatorImpl::_mutex&lt;/tt&gt; &lt;a href=&quot;https://github.com/mongodb/mongo/blob/72e31a462ab80abfdfc36fb76443ec48fc3f65c9/src/mongo/db/repl/replication_coordinator_impl.cpp#L2005&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;code &lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;Acquire &lt;tt&gt;InitialSyncer::_mutex&lt;/tt&gt; &lt;a href=&quot;https://github.com/mongodb/mongo/blob/72e31a462ab80abfdfc36fb76443ec48fc3f65c9/src/mongo/db/repl/replication_coordinator_impl.cpp#L2008&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;code &lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;


&lt;h5&gt;&lt;a name=&quot;%7B%7BInitialSyncer%3A%3AmultiApplierCallback%7D%7D&quot;&gt;&lt;/a&gt;&lt;tt&gt;InitialSyncer::_multiApplierCallback&lt;/tt&gt;&lt;/h5&gt;
&lt;ol&gt;
	&lt;li&gt;Acquire &lt;tt&gt;InitialSyncer::_mutex&lt;/tt&gt; &lt;a href=&quot;https://github.com/mongodb/mongo/blob/72e31a462ab80abfdfc36fb76443ec48fc3f65c9/src/mongo/db/repl/initial_syncer.cpp#L952&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;code &lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;Acquire &lt;tt&gt;ReplicationCoordinatorImpl::_mutex&lt;/tt&gt; &lt;a href=&quot;https://github.com/mongodb/mongo/blob/72e31a462ab80abfdfc36fb76443ec48fc3f65c9/src/mongo/db/repl/initial_syncer.cpp#L963&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;code &lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;Since these two functions acquire the same two locks but in reverse orders, it creates the potential for a deadlock, if each of these functions are running concurrently. One way to fix this would be to stop &lt;tt&gt;InitialSyncer&lt;/tt&gt; from updating the optime of the &lt;tt&gt;ReplicationCoordinator&lt;/tt&gt; on every batch. Alternatively, the &lt;tt&gt;_multiApplierCallback&lt;/tt&gt; could call the &lt;tt&gt;_opts.setLastOpTime&lt;/tt&gt; outside of holding it&apos;s own mutex, since it doesn&apos;t seem necessary to synchronize access to the &lt;tt&gt;InitialSyncer::_lastApplied&lt;/tt&gt; after it&apos;s been written to in that function. &lt;/p&gt;

&lt;p&gt;This issue also occurs in &lt;tt&gt;InitialSyncer::_getNextApplierBatchCallback&lt;/tt&gt;, which acquires the InitialSyncer mutex, and then tries to acquire ReplicationCoordinator&apos;s mutex when calling &lt;tt&gt;_opts.getSlaveDelay()&lt;/tt&gt;.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Original Coverity Report Message:&lt;/p&gt;

&lt;p&gt;Defect 100780 (STATIC_C)&lt;br/&gt;
Checker ORDER_REVERSAL (subcategory none)&lt;br/&gt;
File:  &lt;tt&gt;/src/mongo/db/repl/replication_coordinator_impl.cpp&lt;/tt&gt;&lt;br/&gt;
Function &lt;tt&gt;mongo::repl::ReplicationCoordinatorImpl::processReplSetSyncFrom(mongo::OperationContext *, const mongo::HostAndPort &amp;amp;, mongo::BSONObjBuilder *)&lt;/tt&gt;&lt;/p&gt;
</description>
                <environment></environment>
        <key id="375115">SERVER-28840</key>
            <summary>replSetSyncFrom causes InitialSyncer and ReplicationCoordinator to acquire each other&apos;s mutexes in opposite orders</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13203">Gone away</resolution>
                                        <assignee username="backlog-server-repl">Backlog - Replication Team</assignee>
                                    <reporter username="xgen-internal-coverity">Coverity Collector User</reporter>
                        <labels>
                            <label>coverity</label>
                            <label>neweng</label>
                            <label>syncSourceSelection</label>
                    </labels>
                <created>Tue, 18 Apr 2017 15:17:10 +0000</created>
                <updated>Fri, 27 Oct 2023 20:44:08 +0000</updated>
                            <resolved>Mon, 18 Jun 2018 18:58:40 +0000</resolved>
                                                                    <component>Replication</component>
                                        <votes>0</votes>
                                    <watches>5</watches>
                                                                                                                <comments>
                            <comment id="1923948" author="william.schultz" created="Mon, 18 Jun 2018 18:58:40 +0000"  >&lt;p&gt;Will be fixed by &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-31487&quot; title=&quot;Replace replSetSyncFrom resync option with initialSyncSource server parameter&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-31487&quot;&gt;&lt;del&gt;SERVER-31487&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="1911469" author="spencer" created="Tue, 5 Jun 2018 17:50:41 +0000"  >&lt;p&gt;This can likely be closed once &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-31487&quot; title=&quot;Replace replSetSyncFrom resync option with initialSyncSource server parameter&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-31487&quot;&gt;&lt;del&gt;SERVER-31487&lt;/del&gt;&lt;/a&gt; is implemented&lt;/p&gt;</comment>
                            <comment id="1615300" author="william.schultz" created="Thu, 6 Jul 2017 15:32:51 +0000"  >&lt;p&gt;The issue Coverity picked up on has to deal with inconsistent lock acquisition order, involving the &lt;tt&gt;InitialSyncer::_mutex&lt;/tt&gt; and the &lt;tt&gt;ReplicationCoordinatorImpl::_mutex&lt;/tt&gt;, leading to a potential deadlock. Consider the following two functions, and the order that they acquire locks:&lt;/p&gt;

&lt;h5&gt;&lt;a name=&quot;%7B%7BReplicationCoordinatorImpl%3A%3AprocessReplSetSy...&quot;&gt;&lt;/a&gt;&lt;tt&gt;ReplicationCoordinatorImpl::processReplSetSyncFrom&lt;/tt&gt;&lt;/h5&gt;
&lt;ol&gt;
	&lt;li&gt;Acquire &lt;tt&gt;ReplicationCoordinatorImpl::_mutex&lt;/tt&gt; &lt;a href=&quot;https://github.com/mongodb/mongo/blob/72e31a462ab80abfdfc36fb76443ec48fc3f65c9/src/mongo/db/repl/replication_coordinator_impl.cpp#L2005&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;code &lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;Acquire &lt;tt&gt;InitialSyncer::_mutex&lt;/tt&gt; &lt;a href=&quot;https://github.com/mongodb/mongo/blob/72e31a462ab80abfdfc36fb76443ec48fc3f65c9/src/mongo/db/repl/replication_coordinator_impl.cpp#L2008&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;code &lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;


&lt;h5&gt;&lt;a name=&quot;%7B%7BInitialSyncer%3A%3AmultiApplierCallback%7D%7D&quot;&gt;&lt;/a&gt;&lt;tt&gt;InitialSyncer::_multiApplierCallback&lt;/tt&gt;&lt;/h5&gt;
&lt;ol&gt;
	&lt;li&gt;Acquire &lt;tt&gt;InitialSyncer::_mutex&lt;/tt&gt; &lt;a href=&quot;https://github.com/mongodb/mongo/blob/72e31a462ab80abfdfc36fb76443ec48fc3f65c9/src/mongo/db/repl/initial_syncer.cpp#L952&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;code &lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;Acquire &lt;tt&gt;ReplicationCoordinatorImpl::_mutex&lt;/tt&gt; &lt;a href=&quot;https://github.com/mongodb/mongo/blob/72e31a462ab80abfdfc36fb76443ec48fc3f65c9/src/mongo/db/repl/initial_syncer.cpp#L963&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;code &lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;Since these two functions acquire the same two locks but in reverse orders, it creates the potential for a deadlock, if each of these functions are running on separate threads and try to acquire the locks with a specific thread interleaving. The scenario to actually trigger this deadlock is probably quite unlikely: the &lt;tt&gt;replSetSyncFrom&lt;/tt&gt; command would have to be processed during the execution of &lt;tt&gt;_multiApplierCallback&lt;/tt&gt;, and the interleaving of threads would have to be such that it induced a deadlock. To fix this issue, we could review the lock acquisition order for this pair of locks and see if we are able to make it consistent throughout the codebase, to avoid these kinds of potential deadlocks.&lt;/p&gt;

&lt;p&gt;Note: Through some manual testing, I was actually able to trigger this deadlock externally by adding an extra sleep between the &lt;tt&gt;_multiApplierCallback&lt;/tt&gt;&apos;s acquisition of the &lt;tt&gt;InitialSyncer::_mutex&lt;/tt&gt; and the &lt;tt&gt;ReplicationCoordinatorImpl::_mutex&lt;/tt&gt; and then running an initial sync and calling the &lt;tt&gt;replSetSyncFrom&lt;/tt&gt; command at the correct time during its execution.&lt;/p&gt;

</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="375424">SERVER-28859</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="375893">SERVER-28886</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="443242">SERVER-31487</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="536717">SERVER-34758</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="553818">SERVER-35372</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25128"><![CDATA[Replication]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 6 Jul 2017 15:32:51 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        5 years, 34 weeks, 2 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>luke.bonanomi@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            5 years, 34 weeks, 2 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-server-repl</customfieldvalue>
            <customfieldvalue>xgen-internal-coverity</customfieldvalue>
            <customfieldvalue>spencer@mongodb.com</customfieldvalue>
            <customfieldvalue>william.schultz@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|ht5znj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|ht1a6n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hs3qw7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>