<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:54:30 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-62216] When abortTenantIndexBuilds failed to abort during a tenant migration, we should wait for the createIndex to finish before continuing the MTM</title>
                <link>https://jira.mongodb.org/browse/SERVER-62216</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;We found out that when the tenant_migration_donor_service calls to abort the index build, we end up not aborting the index build because of the current state of the createIndex which here was&#160;kCommitQuorumSatisfied&lt;/p&gt;

&lt;p&gt;We should handle when&#160;IndexBuildsCoordinator::abortTenantIndexBuilds calls&#160;&lt;br/&gt;
 abortIndexBuildByBuildUUID and returns false, &lt;del&gt;we should propagate that failure in order to then abort the tenant migration for that reason. We should also use the reason parameter in order to give more information in the log line.&lt;/del&gt;&lt;/p&gt;

&lt;p&gt;It was decided instead we should wait for the indexBuilds to finish unregistering the index build. There could be a race condition where we would return committed but the index build hasn&apos;t unregistered yet and this could be an issue too.&lt;/p&gt;

&lt;p&gt;The solution to this is to always wait for the&#160;unregisterIndexBuild to finish.&lt;br/&gt;
We should also take this opportunity to give more details in the logs to help diagnose this type of issues / failures.&lt;/p&gt;

&lt;p&gt; In addition to this we will also handle that case to not be an error in the tenant_migration.py hook here :&#160;&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-62154&quot; title=&quot;The tenant_migration.py script should ignore error related to not supported operations during tenant_migration&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-62154&quot;&gt;&lt;del&gt;SERVER-62154&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</description>
                <environment></environment>
        <key id="1955288">SERVER-62216</key>
            <summary>When abortTenantIndexBuilds failed to abort during a tenant migration, we should wait for the createIndex to finish before continuing the MTM</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13201">Fixed</resolution>
                                        <assignee username="mathis.bessa@mongodb.com">Mathis Bessa</assignee>
                                    <reporter username="mathis.bessa@mongodb.com">Mathis Bessa</reporter>
                        <labels>
                    </labels>
                <created>Tue, 21 Dec 2021 22:37:29 +0000</created>
                <updated>Sun, 29 Oct 2023 21:44:47 +0000</updated>
                            <resolved>Wed, 2 Feb 2022 22:00:39 +0000</resolved>
                                    <version>5.1.0</version>
                                    <fixVersion>5.3.0</fixVersion>
                                                        <votes>0</votes>
                                    <watches>6</watches>
                                                                                                                <comments>
                            <comment id="4307363" author="xgen-internal-githook" created="Sat, 22 Jan 2022 00:02:22 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;mathisbessamdb&apos;, &apos;email&apos;: &apos;mathis.bessa@mongodb.com&apos;, &apos;username&apos;: &apos;mathisbessamdb&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-62216&quot; title=&quot;When abortTenantIndexBuilds failed to abort during a tenant migration, we should wait for the createIndex to finish before continuing the MTM&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-62216&quot;&gt;&lt;del&gt;SERVER-62216&lt;/del&gt;&lt;/a&gt; When abortTenantIndexBuilds failed to abort during a tenant migration&lt;br/&gt;
Branch: master&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/98f6025860618721a2fa414a0345caf4c4d5bd57&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/98f6025860618721a2fa414a0345caf4c4d5bd57&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="4271899" author="suganthi.mani" created="Fri, 31 Dec 2021 14:54:27 +0000"  >&lt;p&gt;Just hit my brain, technically, &#160;for the case where abortIndexBuildByBuildUUID() returns true for TryAbortResult::kAlreadyAborted(), it is guaranteed that the index build has completed aborting process i.e, an abortIndexBuild oplog entry has generated for that index build. Basically, abortIndexBuildByBuildUUID() &lt;br/&gt;
1) &lt;b&gt;Acquire the collection lock in &lt;a href=&quot;https://github.com/10gen/mongo/blob/5eab3ca8acaa0e8004c50112187b6809667157a2/src/mongo/db/index_builds_coordinator.cpp#L1099&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;mode X&lt;/a&gt;&lt;/b&gt;.&lt;br/&gt;
2) &lt;a href=&quot;https://github.com/10gen/mongo/blob/5eab3ca8acaa0e8004c50112187b6809667157a2/src/mongo/db/index_builds_coordinator.cpp#L1135&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;tryAbort()&lt;/a&gt; (sets signal to the index build)&lt;br/&gt;
3) &lt;a href=&quot;https://github.com/10gen/mongo/blob/5eab3ca8acaa0e8004c50112187b6809667157a2/src/mongo/db/index_builds_coordinator.cpp#L1160&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;_completeAbort()&lt;/a&gt; (removes index entry from durable catalog &amp;amp; generates abortOplogEntry).&lt;/p&gt;

&lt;p&gt;This means, we can&apos;t have 2 concurrent threads one (tenantMigration thread) executing tryAbort() and other(e.g., dropCollection thread) executing _completeAbort(). &lt;del&gt;This means option#1 (Wait for this ReplIndexBuildState::sharedPromise to get fulfilled) is sufficient&lt;/del&gt; and it&apos;s ok to wait for that promise to be fulfilled &lt;b&gt;only&lt;/b&gt; for abortIndexBuildByBuildUUID() false case. For more clarity to readers, I would also recommend to correct &lt;a href=&quot;https://github.com/10gen/mongo/blob/23bf8408394c73fc143a8093105a688865f5cd4a/src/mongo/db/index_builds_coordinator.h#L275-L280&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;this&lt;/a&gt; comment to something, like,  &lt;br/&gt;
&lt;em&gt;&quot;Returns true if the index build is aborted or already aborted.&quot;&lt;/em&gt;&lt;/p&gt;


&lt;p&gt;&lt;font color=&quot;#DE350B&quot;&gt;&lt;b&gt;UPDATE&lt;/b&gt;:&lt;/font&gt;&lt;br/&gt;
After looking into the code, I realized option#1 is not safe and can make the tenant migration thread to hang forever, waiting for ReplIndexBuildState::sharedPromise to get fulfilled.  There is a case where the &lt;a href=&quot;https://github.com/10gen/mongo/blob/5eab3ca8acaa0e8004c50112187b6809667157a2/src/mongo/db/index_builds_coordinator.cpp#L1954-L1959&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;index build thread won&apos;t fulfill the ReplIndexBuildState::sharedPromise&lt;/a&gt;.  For those cases, &lt;a href=&quot;https://github.com/10gen/mongo/blob/23bf8408394c73fc143a8093105a688865f5cd4a/src/mongo/util/future.h#L907-L911&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;BrokenPromise&lt;/a&gt; error is set on calling SharedPromise destructor,  that&apos;s called when ReplIndexBuildState destructor gets called. But the tenantMigration abort thread is holding a &lt;a href=&quot;https://github.com/10gen/mongo/blob/5eab3ca8acaa0e8004c50112187b6809667157a2/src/mongo/db/index_builds_coordinator.cpp#L788&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;reference&lt;/a&gt; to ReplIndexBuildState which means the ReplIndexBuildState &amp;amp; sharedPromise  destructor&apos;s won&apos;t be called, leading to tenant migration thread hang.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;So, I think, it&apos;s safer to wait for the index build to get removed from the index build registry.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;CC &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=mathis.bessa&quot; class=&quot;user-hover&quot; rel=&quot;mathis.bessa&quot;&gt;mathis.bessa&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="4271871" author="suganthi.mani" created="Fri, 31 Dec 2021 14:04:50 +0000"  >&lt;p&gt;&lt;b&gt;For Future reference:&lt;/b&gt; Following is the another scenario where we can hit the same issue as&#160;BF-23463,&#160;&#160;IndexBuildsCoordinator::abortTenantIndexBuilds() calls &lt;a href=&quot;https://github.com/10gen/mongo/blob/5eab3ca8acaa0e8004c50112187b6809667157a2/src/mongo/db/index_builds_coordinator.cpp#L795&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;abortIndexBuildByBuildUUID()&lt;/a&gt; to abort the index builds. abortIndexBuildByBuildUUID() &lt;a href=&quot;https://github.com/10gen/mongo/blob/23bf8408394c73fc143a8093105a688865f5cd4a/src/mongo/db/index_builds_coordinator.h#L275-L280&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;returns&lt;/a&gt; &lt;br/&gt;
 1) False if the index build does not exist or the index build is already in the process of committing (i.e, the index build has already received signal - IndexBuildAction::kCommitQuorumSatisfied).&lt;br/&gt;
 2) True if the index build was aborted or the index build is already in the process of being aborted.&lt;/p&gt;

&lt;p&gt;BF-23463 showcases the case #1 &quot;where the index build is already in the process of committing&quot;. &lt;b&gt;But we can hit the same BF issue for case #2 &quot;the index build is already in the process of being aborted&quot;&lt;/b&gt;. Looking into the code, we can see ReplIndexBuildState::tryAbort() returns &lt;a href=&quot;https://github.com/10gen/mongo/blob/23bf8408394c73fc143a8093105a688865f5cd4a/src/mongo/db/repl_index_build_state.cpp#L316&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;TryAbortResult::kAlreadyAborted()&lt;/a&gt; if the index build has already received signal &quot;IndexBuildAction::kPrimaryAbort&quot; (Before we drop the indexes/collection/databases, we try to abort the active index builds using signal &quot;kPrimaryAbort&quot; ) but trying to set &quot;IndexBuildAction::kTenantMigrationAbort&quot;. However, for TryAbortResult::kAlreadyAborted() IndexBuildsCoordinator::abortTenantIndexBuilds() &lt;a href=&quot;https://github.com/10gen/mongo/blob/5eab3ca8acaa0e8004c50112187b6809667157a2/src/mongo/db/index_builds_coordinator.cpp#L1139-L1140&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;returns true&lt;/a&gt;. And for this case, it&apos;s not guaranteed that index build has finished completing aborting process i.e, has generated the &quot;abortIndexBuild&quot; oplog entry. This means there is a possibility that an &quot;abortIndexBuild&quot; can be generated after the tenant migration has started (i.e, after _startApplyingDonorOpTime), leading to the server crash.&lt;/p&gt;

&lt;p&gt;In order to tackle both the problematic scenarios , our proposal is to wait for the index build to complete (specifically make sure the index build has finished generating the commit/abortIndexBuild oplog entry) irrespective of abortIndexBuildByBuildUUID() return values. There are currently 2 options:&lt;/p&gt;

&lt;p&gt;1) Wait for this &lt;a href=&quot;https://github.com/10gen/mongo/blob/23bf8408394c73fc143a8093105a688865f5cd4a/src/mongo/db/repl_index_build_state.h#L421&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;ReplIndexBuildState::sharedPromise&lt;/a&gt; to get fulfilled.&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;&lt;b&gt;Rejected:&lt;/b&gt; This still doesn&apos;t guarantee that an &quot;abortIndexBuild&quot; oplog entry has already been generated for the index build, for cases where the index build has already received &quot;kPrimaryAbort&quot; signal.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt; 2) Wait for the index build to get removed from the &lt;a href=&quot;https://github.com/10gen/mongo/blob/23bf8408394c73fc143a8093105a688865f5cd4a/src/mongo/db/active_index_builds.h#L121&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;index build registry&lt;/a&gt;.&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;See here for &lt;a href=&quot;https://github.com/10gen/mongo/blob/5eab3ca8acaa0e8004c50112187b6809667157a2/src/mongo/db/index_builds_coordinator.cpp#L1532-L1544&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;examples&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;CC &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=mathis.bessa&quot; class=&quot;user-hover&quot; rel=&quot;mathis.bessa&quot;&gt;mathis.bessa&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                                                <inwardlinks description="is depended on by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="1953596">SERVER-62154</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10011" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Backwards Compatibility</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10038"><![CDATA[Fully Compatible]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 28 Dec 2021 19:41:07 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        2 years, 2 weeks, 5 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_17050" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Downstream Team Attention</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="16941"><![CDATA[Not Needed]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>luke.bonanomi@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            2 years, 2 weeks, 5 days ago
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_16465" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Linked BF Score</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>15.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>xgen-internal-githook</customfieldvalue>
            <customfieldvalue>mathis.bessa@mongodb.com</customfieldvalue>
            <customfieldvalue>suganthi.mani@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i0fdzb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hzylbj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="5669">Server Serverless 2021-12-27</customfieldvalue>
    <customfieldvalue id="5670">Server Serverless 2022-01-10</customfieldvalue>
    <customfieldvalue id="5671">Server Serverless 2022-01-24</customfieldvalue>
    <customfieldvalue id="5672">Server Serverless 2022-02-07</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i0f04n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>