<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 06:29:48 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-75288] Investigate whether the stepdown killop thread should kill operations that hold the RSTL</title>
                <link>https://jira.mongodb.org/browse/SERVER-75288</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Right now the killop thread currently kills operations that &lt;a href=&quot;https://github.com/mongodb/mongo/blob/56d9c847ef0b81902c80c3f8aa4c921049f02a43/src/mongo/db/repl/replication_coordinator_impl.cpp#L2588&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;took the global lock in a mode conflicting with writes&lt;/a&gt;. We did not kill operations that held the RSTL, because at the time we added the kill op thread, reads held the RSTL (this is safe because long running reads would periodically yield). This gave a better user experience because otherwise readers would have to handle interruption during failovers.&lt;/p&gt;

&lt;p&gt;After &lt;a href=&quot;https://jira.mongodb.org/browse/PM-1527&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;lock free reads&lt;/a&gt;, many reads no longer take the RSTL. So, we should be able to start killing operations that take the RSTL on stepdown.&lt;/p&gt;

&lt;p&gt;This has the benefit of preventing future deadlocks in situations where threads take the global lock in IS mode while implicitly also taking the RSTL, but are blocked waiting on a DB S mode lock that conflicts with a prepared transaction. The prepared transaction would be blocked from committing if the node was trying to stepdown, but couldn&apos;t acquire the RSTL due to the reader thread already holding the RSTL.&lt;/p&gt;

&lt;p&gt;This work also might fix deadlocks of this nature that are already possible that we haven&apos;t noticed yet.  However, I&apos;m not yet sure what complications/side effects making this change would introduce.&lt;/p&gt;</description>
                <environment></environment>
        <key id="2297842">SERVER-75288</key>
            <summary>Investigate whether the stepdown killop thread should kill operations that hold the RSTL</summary>
                <type id="3" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14718&amp;avatarType=issuetype">Task</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="1" iconUrl="https://jira.mongodb.org/images/icons/statuses/open.png" description="">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-server-repl">Backlog - Replication Team</assignee>
                                    <reporter username="samy.lanka@mongodb.com">Samyukta Lanka</reporter>
                        <labels>
                            <label>former-quick-wins</label>
                            <label>repl-shortlist</label>
                    </labels>
                <created>Fri, 24 Mar 2023 20:49:02 +0000</created>
                <updated>Wed, 19 Jul 2023 10:35:55 +0000</updated>
                                                                                                <votes>1</votes>
                                    <watches>10</watches>
                                                                                                                <comments>
                            <comment id="5576317" author="josef.ahmad" created="Wed, 19 Jul 2023 10:35:55 +0000"  >&lt;p&gt;Interrupting any operation holding the RSTL during stepdown would have prevented the deadlock described in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-78662&quot; title=&quot;Deadlock with index build, step down, prepared transaction, and MODE_IS coll lock&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-78662&quot;&gt;&lt;del&gt;SERVER-78662&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="5395869" author="ali.mir" created="Tue, 2 May 2023 21:47:49 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=judah.schvimer%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;judah.schvimer@mongodb.com&quot;&gt;judah.schvimer@mongodb.com&lt;/a&gt; Fair enough, that makes sense to me. I haven&apos;t fully thought it through yet, but I&apos;m just unsure about the potential complexity around introducing interruptions for all internal read operations on failover (internal operations are non-lock free).&lt;/p&gt;</comment>
                            <comment id="5395836" author="judah.schvimer" created="Tue, 2 May 2023 21:41:46 +0000"  >&lt;blockquote&gt;
&lt;p&gt;we&apos;d interrupt any non-lock free read operation on failover.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;I think we want to do this. To &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=schwerin%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;schwerin@mongodb.com&quot;&gt;schwerin@mongodb.com&lt;/a&gt;&apos;s point, the &lt;a href=&quot;https://github.com/mongodb/mongo/blob/4346ec8eeae8c7c01553b09c4877c6e577de3353/src/mongo/db/repl/README.md#replication-state-transition-lock&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;purpose of the RSTL&lt;/a&gt; is to synchronize operations with failover. Thus if an operation doesn&apos;t want to be interrupted on failover, it shouldn&apos;t hold the RSTL. &lt;/p&gt;</comment>
                            <comment id="5395793" author="ali.mir" created="Tue, 2 May 2023 21:27:27 +0000"  >&lt;p&gt;Reassigning back to backlog as I&apos;m beginning my loan on Sharding. Happy to continue the conversation with whoever picks this up.&lt;/p&gt;</comment>
                            <comment id="5395761" author="ali.mir" created="Tue, 2 May 2023 21:11:47 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=schwerin%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;schwerin@mongodb.com&quot;&gt;schwerin@mongodb.com&lt;/a&gt; Are you suggesting that we kill operations that hold the RSTL and a DB/collection lock in S mode? Or generally kill operations that hold the RSTL on stepdown? I agree with the first case, and maybe we should consider moving ahead with that idea. The second case seems too broad &#8212; we&apos;d interrupt any non-lock free read operation on failover.&lt;/p&gt;</comment>
                            <comment id="5395712" author="schwerin" created="Tue, 2 May 2023 20:59:13 +0000"  >&lt;p&gt;If we don&apos;t kill these operations, how do we ensure that they abort or release their locks to make room for unplanned step downs. The original use for this lock was to gather up and kill operations that would otherwise block step downs.&lt;/p&gt;</comment>
                            <comment id="5395538" author="ali.mir" created="Tue, 2 May 2023 20:09:37 +0000"  >&lt;p&gt;Some notes: even though lock-free reads are used for external non-transaction read operations, internal readers still take the RSTL, and killing those operations would result in interruptions. For that reason, I don&apos;t think broadly killing RSTL-holding operations on step down would be feasible. Beyond that, the question for this ticket becomes: how can we prevent deadlock scenarios like in the linked BF ticket? The ultimate root cause of the deadlock was the DBLock acquisition in S mode instead of IS mode, and therefore I&apos;m investigating to see if there is a way to only kill operations that have taken a DB or Collection lock in S mode whilst holding the RSTL in IX mode. &lt;/p&gt;

&lt;p&gt;We don&apos;t have many cases where we explicitly take the DB or collection lock in S mode, but in the existing ones I believe it&apos;s possible to reproduce the same deadlock. However, my concern is that there are some operations (such as &lt;a href=&quot;https://github.com/10gen/mongo/blob/49e521352a4aea1ee5b5bb5132de9b6e3fccf405/src/mongo/db/commands/dbhash.cpp#L249-L259&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;non-PIT dbhash&lt;/a&gt;) that explicitly take the DB or collection lock in S mode. Killing these operations would still introduce new interruptions (whereas before, they&apos;d complete or yield). However, given the same scenario with a prepared txn and stepup, the same operation would deadlock, so perhaps an interrupt is a better solution here.&lt;/p&gt;

&lt;p&gt;Also, looks like &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-71198&quot; title=&quot;Assert that unkillable operations that take X collection locks do not hold the RSTL&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-71198&quot;&gt;SERVER-71198&lt;/a&gt; is quite similar to this ticket (overall PM-3075 is related). Thinking out loud: I wonder if it would be beneficial to share implementation between &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-71198&quot; title=&quot;Assert that unkillable operations that take X collection locks do not hold the RSTL&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-71198&quot;&gt;SERVER-71198&lt;/a&gt; and this one.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="2297785">SERVER-75285</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="2383665">SERVER-78662</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="2178905">SERVER-71198</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25128"><![CDATA[Replication]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 2 May 2023 20:09:37 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        29 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>josef.ahmad@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            29 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_16465" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Linked BF Score</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>135.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>ali.mir@mongodb.com</customfieldvalue>
            <customfieldvalue>schwerin@mongodb.com</customfieldvalue>
            <customfieldvalue>backlog-server-repl</customfieldvalue>
            <customfieldvalue>josef.ahmad@mongodb.com</customfieldvalue>
            <customfieldvalue>judah.schvimer@mongodb.com</customfieldvalue>
            <customfieldvalue>samy.lanka@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i21tev:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|i1k67c:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="7169">Repl 2023-04-17</customfieldvalue>
    <customfieldvalue id="7170">Repl 2023-05-01</customfieldvalue>
    <customfieldvalue id="7171">Repl 2023-05-15</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i21fk7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>