<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:57:29 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-41355] Step down should call yieldLocksForPreparedTransactions w/o holding repl mutex lock (ReplicationCoordinatorImpl::_mutex).</title>
                <link>https://jira.mongodb.org/browse/SERVER-41355</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Currently, step down calls&#160;yieldLocksForPreparedTransactions by holding both RSTL and repl mutex lock. As a result, this can deadlock with prepared txn threads that have checked out the session. Consider the below case.&#160;&lt;/p&gt;

&lt;p&gt;1) Thread A (txn cmd) has checked out the session.&lt;br/&gt;
 2) Step down has acquired RSTL lock and repl mutex lock.&lt;br/&gt;
 3)&#160;Step down calls &lt;em&gt;yieldLocksForPreparedTransactions&lt;/em&gt; which marks the thread A as killed as it has checked out the session and its transaction state is &lt;b&gt;TransactionState::kPrepared&lt;/b&gt;&lt;br/&gt;
 4)&lt;font color=&quot;#de350b&quot;&gt; Thread A tries to acquire repl mutex lock which is held by step down thread&lt;/font&gt;.&lt;br/&gt;
 5)&lt;font color=&quot;#de350b&quot;&gt;&#160;Step down waits for thread A to check in the session&lt;/font&gt;, so that it can check out the session and perform lock yielding of that prepared txn. But, &lt;font color=&quot;#de350b&quot;&gt;thread A can&apos;t check in the session as it waiting for the repl mutex lock which is not interruptible.&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-41317&quot; title=&quot;Push commitTransaction&amp;#39;s check for a majority-committed prepare down into the TransactionParticipant&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-41317&quot;&gt;&lt;del&gt;SERVER-41317&lt;/del&gt;&lt;/a&gt; describes a problem happened due to above scenario.&lt;/p&gt;</description>
                <environment></environment>
        <key id="780754">SERVER-41355</key>
            <summary>Step down should call yieldLocksForPreparedTransactions w/o holding repl mutex lock (ReplicationCoordinatorImpl::_mutex).</summary>
                <type id="3" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14718&amp;avatarType=issuetype">Task</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13201">Fixed</resolution>
                                        <assignee username="suganthi.mani@mongodb.com">Suganthi Mani</assignee>
                                    <reporter username="suganthi.mani@mongodb.com">Suganthi Mani</reporter>
                        <labels>
                    </labels>
                <created>Wed, 29 May 2019 14:20:03 +0000</created>
                <updated>Sun, 29 Oct 2023 22:20:39 +0000</updated>
                            <resolved>Mon, 15 Jul 2019 16:21:23 +0000</resolved>
                                                    <fixVersion>4.2.0-rc3</fixVersion>
                    <fixVersion>4.3.1</fixVersion>
                                    <component>Replication</component>
                                        <votes>0</votes>
                                    <watches>5</watches>
                                                                                                                <comments>
                            <comment id="2327160" author="xgen-internal-githook" created="Mon, 15 Jul 2019 15:43:59 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;Suganthi Mani&apos;, &apos;email&apos;: &apos;suganthi.mani@mongodb.com&apos;, &apos;username&apos;: &apos;smani87&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-41355&quot; title=&quot;Step down should call yieldLocksForPreparedTransactions w/o holding repl mutex lock (ReplicationCoordinatorImpl::_mutex).&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-41355&quot;&gt;&lt;del&gt;SERVER-41355&lt;/del&gt;&lt;/a&gt; Step down calls yieldLocksForPreparedTransactions without&lt;br/&gt;
holding repl mutex.&lt;/p&gt;

&lt;p&gt;(cherry picked from commit cc1a75e4a6d8de8478e7253da7bd6376052d57a6)&lt;br/&gt;
Branch: v4.2&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/79bd2579429eb6827e2e87e0321d6cad32395c6a&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/79bd2579429eb6827e2e87e0321d6cad32395c6a&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="2327144" author="xgen-internal-githook" created="Mon, 15 Jul 2019 15:35:50 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;Suganthi Mani&apos;, &apos;username&apos;: &apos;smani87&apos;, &apos;email&apos;: &apos;suganthi.mani@mongodb.com&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-41355&quot; title=&quot;Step down should call yieldLocksForPreparedTransactions w/o holding repl mutex lock (ReplicationCoordinatorImpl::_mutex).&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-41355&quot;&gt;&lt;del&gt;SERVER-41355&lt;/del&gt;&lt;/a&gt; Step down calls yieldLocksForPreparedTransactions without&lt;br/&gt;
holding repl mutex.&lt;br/&gt;
Branch: master&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/cc1a75e4a6d8de8478e7253da7bd6376052d57a6&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/cc1a75e4a6d8de8478e7253da7bd6376052d57a6&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="2288520" author="suganthi.mani" created="Tue, 18 Jun 2019 15:18:08 +0000"  >&lt;p&gt;&lt;b&gt;Investigation:&lt;/b&gt;&lt;br/&gt;
 Currently, step down cmd (conditional step) down should not release the repl mutex lock before calling yieldLocksForPreparedTransactions(). Else, concurrent step ups or step downs can happen which can lead to server crash.&lt;/p&gt;

&lt;p&gt;For step down cmd, we call &lt;a href=&quot;https://github.com/mongodb/mongo/blob/7c3118aa8d612ffb91dd0c8ca43e730953b7900e/src/mongo/db/repl/replication_coordinator_impl.cpp#L2092&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;yieldLocksForPreparedTransactions()&lt;/a&gt; only after &lt;a href=&quot;https://github.com/mongodb/mongo/blob/7c3118aa8d612ffb91dd0c8ca43e730953b7900e/src/mongo/db/repl/replication_coordinator_impl.cpp#L2046&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;stepping down &lt;/a&gt; (i.e. &lt;a href=&quot;https://github.com/mongodb/mongo/blob/7c3118aa8d612ffb91dd0c8ca43e730953b7900e/src/mongo/db/repl/topology_coordinator.cpp#L2411&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;TopologyCoordinator::_role&lt;/a&gt; is set to Role::kFollower and &lt;a href=&quot;https://github.com/mongodb/mongo/blob/7c3118aa8d612ffb91dd0c8ca43e730953b7900e/src/mongo/db/repl/topology_coordinator.cpp#L2412&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;TopologyCoordinator::_leaderMode&lt;/a&gt; is set to LeaderMode::kNotLeader) but the &lt;a href=&quot;https://github.com/mongodb/mongo/blob/7c3118aa8d612ffb91dd0c8ca43e730953b7900e/src/mongo/db/repl/replication_coordinator_impl.cpp#L2096&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;member state in the replicationCoordinator&lt;/a&gt; is not yet updated.&lt;/p&gt;

&lt;p&gt;Concurrent step up:&#160;&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;If we release the repl mutex lock after stepping down and before yieldLocksForPreparedTransactions(), a concurrent step up can start an election because the role of the node is &lt;a href=&quot;https://github.com/mongodb/mongo/blob/7c3118aa8d612ffb91dd0c8ca43e730953b7900e/src/mongo/db/repl/replication_coordinator_impl_heartbeat.cpp#L879&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;no longer a leader or candidate&lt;/a&gt;. Even though, the step up gets &lt;a href=&quot;https://github.com/mongodb/mongo/blob/7c3118aa8d612ffb91dd0c8ca43e730953b7900e/src/mongo/db/repl/replication_coordinator_impl_elect_v1.cpp#L252&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;struck&lt;/a&gt; before starting a real election (as it needs to persist the last vote info in disk which requires RSTL in IX mode but gets blocked behind step down which holds RSTL lock in X mode), the server crash can happen. Since, to start an election the step up changes the &lt;a href=&quot;https://github.com/mongodb/mongo/blob/7c3118aa8d612ffb91dd0c8ca43e730953b7900e/src/mongo/db/repl/topology_coordinator.cpp#L2805&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;TopologyCoordinator::_role to kCandidate&lt;/a&gt; and now if _handleHeartbeatResponse() code path notices that the member state maintained by replicationCoordinator and topologyCoordinator are not same, it will try to update the replicationCoordinator&apos;s member state which can lead to &lt;a href=&quot;https://github.com/mongodb/mongo/blob/7c3118aa8d612ffb91dd0c8ca43e730953b7900e/src/mongo/db/repl/replication_coordinator_impl.cpp#L2865&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;this&lt;/a&gt; invariant failure.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Concurrent step down:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Since, the member state of replicationCoordinator is &lt;a href=&quot;https://github.com/mongodb/mongo/blob/7c3118aa8d612ffb91dd0c8ca43e730953b7900e/src/mongo/db/repl/replication_coordinator_impl.cpp#L2649&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;still primary&lt;/a&gt;, step down triggered by force reconfig or heatbeat reconfig can occur and can lead to an &lt;a href=&quot;https://github.com/mongodb/mongo/blob/7c3118aa8d612ffb91dd0c8ca43e730953b7900e/src/mongo/db/repl/replication_coordinator_impl.cpp#L2650&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;unsafe/invalid&lt;/a&gt; state where the node&apos;s role reflects kFollower but the leaderMode in kSteppingDown.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;To be noted, releasing repl mutex lock before calling yieldLocksForPreparedTransactions() is not a problem for unconditional step down code paths, as we haven&apos;t &lt;a href=&quot;https://github.com/mongodb/mongo/blob/7c3118aa8d612ffb91dd0c8ca43e730953b7900e/src/mongo/db/repl/replication_coordinator_impl_heartbeat.cpp#L421&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;stepped down&lt;/a&gt; (i.e. _role or _leader value of the topologyCoordinator are not yet changed)before &lt;a href=&quot;https://github.com/mongodb/mongo/blob/7c3118aa8d612ffb91dd0c8ca43e730953b7900e/src/mongo/db/repl/replication_coordinator_impl_heartbeat.cpp#L418&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;yieldLocksForPreparedTransactions()&lt;/a&gt;. So, no concurrent step ups or step downs can happen.&lt;/p&gt;

&lt;p&gt;&#160;&lt;b&gt;Solution:&lt;/b&gt;&lt;br/&gt;
 In order to make step down code paths to call yieldLocksForPreparedTransactions() method w/o holding repl mutex lock, the step down cmd (conditional step down) should perform below sequence.&lt;/p&gt;

&lt;p&gt;1) TopologyCoordinator::attemptStepDown &lt;a href=&quot;https://github.com/mongodb/mongo/blob/7c3118aa8d612ffb91dd0c8ca43e730953b7900e/src/mongo/db/repl/topology_coordinator.cpp#L2263&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;should not perform step down&lt;/a&gt; (i.e. should not change the role or leaderMode), instead it will upgrade its status from &lt;a href=&quot;https://github.com/mongodb/mongo/blob/7c3118aa8d612ffb91dd0c8ca43e730953b7900e/src/mongo/db/repl/topology_coordinator.cpp#L1315&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;conditional step down to unconditional step down&lt;/a&gt; (i.e the leaderMode will gets transitioned from kAttemptingStepDown to kSteppingDown). By doing this, we prevent any concurrent step ups and step downs. Also, we guarantee that the step down cmd won&apos;t fail after this point and&#160;&lt;b&gt;safe to release the repl mutex lock w/ RSTL lock held.&lt;/b&gt;&lt;br/&gt;
 2) Release the repl Mutex lock&lt;br/&gt;
 3) Call yieldLocksForPreparedTransactions ().&lt;br/&gt;
 4) Reacquire the repl mutex lock.&lt;br/&gt;
 5) Now perform step down (i.e. call finishUnconditionalStepDown()).&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10420">
                    <name>Backports</name>
                                            <outwardlinks description="backported by">
                                                        </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                                                <inwardlinks description="is depended on by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="777377">SERVER-41317</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_12450" key="com.atlassian.jira.plugin.system.customfieldtypes:multicheckboxes">
                        <customfieldname>Backport Requested</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="16775"><![CDATA[v4.2]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10011" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Backwards Compatibility</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10038"><![CDATA[Fully Compatible]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 15 Jul 2019 15:35:50 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        4 years, 30 weeks, 2 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_17050" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Downstream Team Attention</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="16941"><![CDATA[Not Needed]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>luke.bonanomi@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            4 years, 30 weeks, 2 days ago
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_16465" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Linked BF Score</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>xgen-internal-githook</customfieldvalue>
            <customfieldvalue>suganthi.mani@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hv1o7z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr84vz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="2999">Repl 2019-06-03</customfieldvalue>
    <customfieldvalue id="3000">Repl 2019-06-17</customfieldvalue>
    <customfieldvalue id="3001">Repl 2019-07-01</customfieldvalue>
    <customfieldvalue id="3026">Repl 2019-07-15</customfieldvalue>
    <customfieldvalue id="3028">Repl 2019-07-29</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hv1ahb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>