<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:59:06 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-41934] Filtering metadata could be stale and serve queries if stepdown happens during migration</title>
                <link>https://jira.mongodb.org/browse/SERVER-41934</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;During migration, we &lt;a href=&quot;https://github.com/mongodb/mongo/blob/622f65ad9147ae49592148e1f8e7e274af8724be/src/mongo/db/s/migration_source_manager.cpp#L340-L351&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;persist a critical section counter&lt;/a&gt; which is replicated to secondaries to make them &lt;a href=&quot;https://github.com/mongodb/mongo/blob/3b8f36f2f25f08b2da7e4d560e3207d015bbb978/src/mongo/db/s/shard_server_op_observer.cpp#L290&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;clear their filtering metadata&lt;/a&gt; so that their next refresh will see the result of the migration. The idea is that when the secondary refreshes it will &lt;a href=&quot;https://github.com/mongodb/mongo/blob/622f65ad9147ae49592148e1f8e7e274af8724be/src/mongo/db/s/shard_server_catalog_cache_loader.cpp#L590&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;refresh from the primary&lt;/a&gt; by &lt;a href=&quot;https://github.com/mongodb/mongo/blob/622f65ad9147ae49592148e1f8e7e274af8724be/src/mongo/db/s/shard_server_catalog_cache_loader.cpp#L284-L290&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;calling forceRoutingTableRefresh on the primary and waiting for it to replicate&lt;/a&gt;, &lt;a href=&quot;https://github.com/mongodb/mongo/blob/622f65ad9147ae49592148e1f8e7e274af8724be/src/mongo/db/s/flush_routing_table_cache_updates_command.cpp#L118-L129&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;which waits for the critical section before refreshing&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;However, when we persist that critical section counter, &lt;a href=&quot;https://github.com/mongodb/mongo/blob/master/src/mongo/db/s/shard_metadata_util.cpp#L205-L248&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;we don&apos;t use majority write concern&lt;/a&gt;, and we never wait for majority before we &lt;a href=&quot;https://github.com/mongodb/mongo/blob/622f65ad9147ae49592148e1f8e7e274af8724be/src/mongo/db/s/migration_source_manager.cpp#L424-L430&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;commit the migration on the config server&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This means that if we&lt;br/&gt;
1. Start a migration&lt;br/&gt;
2. Write the critical section counter. Suppose it doesn&apos;t get replicated at all.&lt;br/&gt;
3. Commit the migration on the config server.&lt;br/&gt;
4. Failover&lt;br/&gt;
5. A new primary is elected which does not know that a migration has occurred, and could continue serving requests for a router which is equally as stale as the secondary, leading to stale data being read.&lt;/p&gt;

&lt;p&gt;We should verify this with a jstest and then fix by persisting the critical section counter with majority write concern.&lt;/p&gt;</description>
                <environment></environment>
        <key id="816052">SERVER-41934</key>
            <summary>Filtering metadata could be stale and serve queries if stepdown happens during migration</summary>
                <type id="3" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14718&amp;avatarType=issuetype">Task</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13203">Gone away</resolution>
                                        <assignee username="backlog-server-sharding">[DO NOT USE] Backlog - Sharding Team</assignee>
                                    <reporter username="matthew.saltz@mongodb.com">Matthew Saltz</reporter>
                        <labels>
                    </labels>
                <created>Wed, 26 Jun 2019 16:05:11 +0000</created>
                <updated>Fri, 27 Oct 2023 20:42:49 +0000</updated>
                            <resolved>Mon, 17 Aug 2020 09:01:20 +0000</resolved>
                                                                    <component>Sharding</component>
                                        <votes>0</votes>
                                    <watches>3</watches>
                                                                                                                <comments>
                            <comment id="3341943" author="kaloian.manassiev" created="Mon, 17 Aug 2020 09:01:20 +0000"  >&lt;p&gt;Gone away as result of the changes for PM-1645, Milestone 1.&lt;/p&gt;</comment>
                            <comment id="2493457" author="matthew.saltz" created="Mon, 21 Oct 2019 20:34:56 +0000"  >&lt;p&gt;I think there&apos;s another solution to this. We can move this write to prior to the writes critical section and make it a majority write, and then add a different portion of the critical section for blocking secondary reads. In other words, currently, the flow on the donor is like:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Enter the writes critical secion&lt;/li&gt;
	&lt;li&gt;&lt;b&gt;Locally&lt;/b&gt; write the crit sec counter to signal to secondaries to refresh from the primary. This currently has to be after entering the writes critical section because &lt;a href=&quot;https://github.com/mongodb/mongo/blob/8e59506fde602c19e13f83f6bb02d47db403ae70/src/mongo/db/s/flush_routing_table_cache_updates_command.cpp#L119-L126&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;flushRoutingTableCacheUpdates blocks behind the writes critical section&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;Do commit on donor&lt;/li&gt;
	&lt;li&gt;Enter reads critical section&lt;/li&gt;
	&lt;li&gt;Commit on config server&lt;/li&gt;
	&lt;li&gt;Exit critical section&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;I think we can add a new &quot;block secondary reads (BSR)&quot; phase so it looks like:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Enter the BSR critical section&lt;/li&gt;
	&lt;li&gt;&lt;b&gt;Majority&lt;/b&gt; write the crit sec counter to signal to secondaries to refresh from the primary. Refreshing from the primary will block behind the BSR critical section.&lt;/li&gt;
	&lt;li&gt;Enter writes critical section&lt;/li&gt;
	&lt;li&gt;... Same as before&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;We could also just have a separate concept from the critical section that gives secondary refreshes something to block on.&lt;/p&gt;

&lt;p&gt;Thanks to &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=schwerin&quot; class=&quot;user-hover&quot; rel=&quot;schwerin&quot;&gt;schwerin&lt;/a&gt; for pointing out that we probably don&apos;t need to do this write during the critical section, which I think is true.&lt;/p&gt;</comment>
                            <comment id="2352875" author="matthew.saltz" created="Wed, 31 Jul 2019 18:46:26 +0000"  >&lt;p&gt;Just to follow up: I think majority writing the critical section counter would by far be the simplest and easiest to understand fix. &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=kaloian.manassiev&quot; class=&quot;user-hover&quot; rel=&quot;kaloian.manassiev&quot;&gt;kaloian.manassiev&lt;/a&gt; are you confident that this would be too expensive? Do you think it&apos;s necessary to go with the &apos;clear metadata for all collections&apos; approach? I think we&apos;d also have to think carefully about that approach especially given the recent discussion around clearing the _receivingChunks list when we clear the active metadata. It could also cause a &quot;refresh storm&quot; after they are all cleared. Is that a concern?&lt;/p&gt;</comment>
                            <comment id="2321742" author="esha.maharishi@10gen.com" created="Thu, 11 Jul 2019 19:34:28 +0000"  >&lt;p&gt;Ah, that&apos;s true :/&lt;/p&gt;</comment>
                            <comment id="2321709" author="matthew.saltz" created="Thu, 11 Jul 2019 19:18:51 +0000"  >&lt;p&gt;I think that would race with secondaries updating their routing tables from the primary which blocks behind &lt;a href=&quot;https://github.com/mongodb/mongo/blob/622f65ad9147ae49592148e1f8e7e274af8724be/src/mongo/db/s/flush_routing_table_cache_updates_command.cpp#L118-L129&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;this&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="2321700" author="esha.maharishi@10gen.com" created="Thu, 11 Jul 2019 19:13:27 +0000"  >&lt;p&gt;I wonder if we could write the enterCriticalSectionCounter flag just before the primary enters the critical section, &lt;a href=&quot;https://github.com/mongodb/mongo/blob/1719255a297343227cd9a190a3280fb8fef2c488/src/mongo/db/s/migration_source_manager.cpp#L330-L335&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;the way we do the minOpTimeRecovery document&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This means secondaries would enter the critical section as soon as they see the flag, but the primary would not until the flag has majority replicated.&lt;/p&gt;

&lt;p&gt;We could also wait for majority just once after writing both the minOpTime and enterCriticalSectionCounter flag.&lt;/p&gt;</comment>
                            <comment id="2321640" author="matthew.saltz" created="Thu, 11 Jul 2019 18:53:39 +0000"  >&lt;p&gt;I had talked to &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=kaloian.manassiev&quot; class=&quot;user-hover&quot; rel=&quot;kaloian.manassiev&quot;&gt;kaloian.manassiev&lt;/a&gt; shortly after this ticket was filed, and he suggested that making the critical section counter write wait for majority write concern could be too costly since it happens in the critical section. So I think he suggested that &lt;a href=&quot;https://github.com/mongodb/mongo/blob/27807650274531ee8031cb989c2ed33bdc9bee21/src/mongo/db/s/sharding_state_recovery.cpp#L242-L257&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;when recovering the sharding state, if we see that there were in flight migrations&lt;/a&gt; we should clear out the filtering metadata for all collections. Does this sound problematic?&lt;/p&gt;</comment>
                            <comment id="2309254" author="kaloian.manassiev" created="Tue, 2 Jul 2019 12:50:32 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=renctan&quot; class=&quot;user-hover&quot; rel=&quot;renctan&quot;&gt;renctan&lt;/a&gt; pointed me to the mistake in my reasoning yesterday - having an outstanding minOpTime recovery increment after a step-up only instructs the shard to sync-up its config server opTime, but doesn&apos;t cause a refresh of the collection sharding state. Because of this a node becoming a primary can still continue accept reads and writes at a stale version.&lt;/p&gt;</comment>
                            <comment id="2306724" author="esha.maharishi@10gen.com" created="Mon, 1 Jul 2019 14:21:00 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=kaloian.manassiev&quot; class=&quot;user-hover&quot; rel=&quot;kaloian.manassiev&quot;&gt;kaloian.manassiev&lt;/a&gt; I still think there is a bug: having a non-empty minOpTime &lt;a href=&quot;https://github.com/mongodb/mongo/blob/9202b2024c943984bf6436c2110ef0220bf0927c/src/mongo/db/repl/replication_coordinator_external_state_impl.cpp#L768&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;on stepup&lt;/a&gt; simply makes a new primary &lt;a href=&quot;https://github.com/mongodb/mongo/blob/9202b2024c943984bf6436c2110ef0220bf0927c/src/mongo/db/s/sharding_state_recovery.cpp#L242-L257&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;get the latest configOpTime&lt;/a&gt;, so that &lt;em&gt;later&lt;/em&gt; refreshes will use the latest configOpTime. It does not cause the new primary to synchronously force refreshes for any collections before accepting requests.&lt;/p&gt;</comment>
                            <comment id="2306701" author="renctan" created="Mon, 1 Jul 2019 14:13:33 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=kaloian.manassiev&quot; class=&quot;user-hover&quot; rel=&quot;kaloian.manassiev&quot;&gt;kaloian.manassiev&lt;/a&gt; The write to recovery document happens before the update to the config collection that will cause the secondary to refresh. So if the secondary did not see that write when it becomes the primary, it can have a stale collection metadata even if the config opTime is up to date since nothing will prompt it to talk to the config server.&lt;/p&gt;</comment>
                            <comment id="2306444" author="kaloian.manassiev" created="Mon, 1 Jul 2019 10:00:52 +0000"  >&lt;blockquote&gt;&lt;p&gt;5. A new primary is elected which does not know that a migration has occurred, and could continue serving requests for a router which is equally as stale as the secondary, leading to stale data being read.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=matthew.saltz&quot; class=&quot;user-hover&quot; rel=&quot;matthew.saltz&quot;&gt;matthew.saltz&lt;/a&gt;, I don&apos;t think this will actually happen. The donor does two more activities as part of committing the migration and these are:&lt;br/&gt;
6. Perform an &lt;a href=&quot;https://github.com/mongodb/mongo/blob/622f65ad9147ae49592148e1f8e7e274af8724be/src/mongo/db/s/migration_source_manager.cpp#L331&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;increment&lt;/a&gt; of the &lt;em&gt;minOpTime&lt;/em&gt; recovery document with majority write concern&lt;br/&gt;
7. Perform a &lt;a href=&quot;https://github.com/mongodb/mongo/blob/622f65ad9147ae49592148e1f8e7e274af8724be/src/mongo/db/s/migration_source_manager.cpp#L760&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;decrement&lt;/a&gt; with local, but only after the &lt;a href=&quot;#L756]&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;post-commit refresh has completed&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Therefore, if after step (5) above a new primary is elected, which never sees the write at step (2), but the migration is committed, it must at least see the increment of the minOpTime recovery document and will therefore sync with the config server and catch-up to the correct shard version. Alternatively, if it happens to see the decrement from step (7), it must also have seen the write from step (2) and &lt;a href=&quot;https://github.com/mongodb/mongo/blob/622f65ad9147ae49592148e1f8e7e274af8724be/src/mongo/db/s/shard_server_op_observer.cpp#L290&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;cleared its metadata in order to force refresh&lt;/a&gt;, so it won&apos;t be accepting writes at the old version.&lt;/p&gt;

&lt;p&gt;I agree though that the algorithm is not very well described or commented in the code and it took me a while to read that through. Could we maybe formalize it a bit as part of the new writes, which you are adding as part of the Range Deleter project?&lt;/p&gt;
</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25141"><![CDATA[Sharding]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 1 Jul 2019 10:00:52 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        3 years, 25 weeks, 2 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                            <customfield id="customfield_10857" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>PM-1645</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>luke.bonanomi@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            3 years, 25 weeks, 2 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-server-sharding</customfieldvalue>
            <customfieldvalue>esha.maharishi@mongodb.com</customfieldvalue>
            <customfieldvalue>kaloian.manassiev@mongodb.com</customfieldvalue>
            <customfieldvalue>matthew.saltz@mongodb.com</customfieldvalue>
            <customfieldvalue>randolph@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hv7etr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hv2plj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hv7133:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>