<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:53:42 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-61933] Interleaved operations during sharded collection drop violate change stream guarantees</title>
                <link>https://jira.mongodb.org/browse/SERVER-61933</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Following a conversation with &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=max.hirschhorn&quot; class=&quot;user-hover&quot; rel=&quot;max.hirschhorn&quot;&gt;max.hirschhorn&lt;/a&gt;, it appears that while dropping a sharded collection, it is possible for CRUD or DDL events to interleave with the collection drops from the individual shards, based on their respective &lt;tt&gt;clusterTimes&lt;/tt&gt;:&lt;/p&gt;

&lt;div class=&apos;table-wrap&apos;&gt;
&lt;table class=&apos;confluenceTable&apos;&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;&amp;nbsp;&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;Shard 1&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;Shard 2&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;Shard 3&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;T1&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;drop&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&amp;nbsp;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;T2&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&amp;nbsp;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;insert&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;T3&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&amp;nbsp;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;drop&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;T4&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&amp;nbsp;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&amp;nbsp;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;drop&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;


&lt;p&gt;In the above scenario, for a single-collection stream, this will cause change streams to break in two significant ways:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;As soon as the first &lt;tt&gt;drop&lt;/tt&gt; is observed, we will invalidate the stream. The &lt;tt&gt;insert&lt;/tt&gt; event is therefore never seen by the client.&lt;/li&gt;
	&lt;li&gt;If the client attempts to &lt;tt&gt;startAfter&lt;/tt&gt; the &lt;tt&gt;invalidate&lt;/tt&gt; resume token, then we will &lt;a href=&quot;https://github.com/mongodb/mongo/blob/ce6aa8504efc89566558eba86bb753076b36472c/src/mongo/db/pipeline/document_source_change_stream_check_invalidate.cpp#L133-L142&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;attempt to swallow the drops on all shards to prevent an immediate re-invalidation of the stream&lt;/a&gt;. However, this mechanism is &lt;a href=&quot;https://github.com/mongodb/mongo/blob/ce6aa8504efc89566558eba86bb753076b36472c/src/mongo/db/pipeline/document_source_change_stream_check_invalidate.cpp#L172-L175&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;based on the expectation that the next event we see after the resume point is an invalidating event&lt;/a&gt;. In the above case, on shard 2 we will stop swallowing as soon as we see the &lt;tt&gt;insert&lt;/tt&gt;, which means that the subsequent &lt;tt&gt;drop&lt;/tt&gt; on the same shard will re-invalidate the resumed stream.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;In order for change stream semantics to remain robust through collection drops, &lt;b&gt;there cannot be any CRUD or DDL events on that collection at or later than the &lt;tt&gt;clusterTime&lt;/tt&gt; of the earliest drop event across all shards.&lt;/b&gt; Max tells me that this is already the case for sharded &lt;tt&gt;renameCollection&lt;/tt&gt;, and that it may be possible to apply a similar approach to &lt;tt&gt;drop&lt;/tt&gt;. The individual collection drops performed by the &lt;tt&gt;dropDatabase&lt;/tt&gt; command should similarly obey the above constraints.&lt;/p&gt;</description>
                <environment></environment>
        <key id="1943610">SERVER-61933</key>
            <summary>Interleaved operations during sharded collection drop violate change stream guarantees</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="10038" iconUrl="https://jira.mongodb.org/images/icons/subtask.gif" description="">Backlog</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-query-execution">Backlog - Query Execution</assignee>
                                    <reporter username="bernard.gorman@mongodb.com">Bernard Gorman</reporter>
                        <labels>
                    </labels>
                <created>Mon, 6 Dec 2021 23:13:31 +0000</created>
                <updated>Wed, 1 Feb 2023 23:55:14 +0000</updated>
                                                                                                <votes>0</votes>
                                    <watches>5</watches>
                                                                                                                <comments>
                            <comment id="4285037" author="JIRAUSER1257318" created="Tue, 11 Jan 2022 08:46:32 +0000"  >&lt;blockquote&gt;&lt;p&gt;For whether it is guaranteed to happen&#160;&lt;b&gt;before&lt;/b&gt;&#160;any new events on the same namespace: I am certain that we must have such a guarantee, because without it we can potentially wipe-out a recreated collection after a step-down in the middle of a drop.&#160;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=jordi.serra-torrens&quot; class=&quot;user-hover&quot; rel=&quot;jordi.serra-torrens&quot;&gt;jordi.serra-torrens&lt;/a&gt;, do you happen to know how exactly do we ensure that? Is it a matter of us not releasing the critical section before we have deleted the coordinator document?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;ShardingDDLCoordinator will first &lt;a href=&quot;https://github.com/mongodb/mongo/blob/204d6eed96ed5151b1d11b22d128d923b452833f/src/mongo/db/s/sharding_ddl_coordinator.cpp#L263&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;remove the coordinator document&lt;/a&gt; for the drop and then &lt;a href=&quot;https://github.com/mongodb/mongo/blob/204d6eed96ed5151b1d11b22d128d923b452833f/src/mongo/db/s/sharding_ddl_coordinator.cpp#L300&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;release the distlocks&lt;/a&gt;. This guarantees that indeed  the coordinator document is removed before any other coordinator for that ns can run. However, I&apos;m having a hard time thinking about how we protect an implicit collection creation (unsharded) from sneaking it at that point, since we don&apos;t even take the critical section on the dropCollection coordinator.&lt;/p&gt;</comment>
                            <comment id="4284971" author="kaloian.manassiev" created="Tue, 11 Jan 2022 07:09:33 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=bernard.gorman&quot; class=&quot;user-hover&quot; rel=&quot;bernard.gorman&quot;&gt;bernard.gorman&lt;/a&gt;, apologies for the delayed reply. The DDL coordinator doesn&apos;t emit any special event to mark the end of the operation, because we never thought we will need to do that. But it happens to be that the last thing that all coordinators do is to &lt;a href=&quot;https://github.com/mongodb/mongo/blob/204d6eed96ed5151b1d11b22d128d923b452833f/src/mongo/db/s/sharding_ddl_coordinator.cpp#L77&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;delete the coordinator document&lt;/a&gt;. This event is guaranteed to happen &lt;b&gt;after&lt;/b&gt; all participant operations have completed and majority committed (e.g., drop events on the shards).&lt;/p&gt;

&lt;p&gt;For whether it is guaranteed to happen &lt;b&gt;before&lt;/b&gt; any new events on the same namespace: I am certain that we must have such a guarantee, because without it we can potentially wipe-out a recreated collection after a step-down in the middle of a drop. &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=jordi.serra-torrens&quot; class=&quot;user-hover&quot; rel=&quot;jordi.serra-torrens&quot;&gt;jordi.serra-torrens&lt;/a&gt;, do you happen to know how exactly do we ensure that? Is it a matter of us not releasing the critical section before we have deleted the coordinator document?&lt;/p&gt;</comment>
                            <comment id="4260989" author="bernard.gorman" created="Tue, 21 Dec 2021 02:52:05 +0000"  >&lt;p&gt;Hey &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=kaloian.manassiev&quot; class=&quot;user-hover&quot; rel=&quot;kaloian.manassiev&quot;&gt;kaloian.manassiev&lt;/a&gt;: does the DDL co-ordinator emit a separate oplog event to mark the end of the entire operation? If so then I think this could work, assuming that the following points are both guaranteed:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;This event will be written into the oplog with a later &lt;tt&gt;optime&lt;/tt&gt; than the &lt;tt&gt;drop&lt;/tt&gt; events on any of the shards&lt;/li&gt;
	&lt;li&gt;This event will appear in the stream BEFORE any further operations on the same namespace - e.g. it&apos;s not possible for a collection re-creation to sneak in between the final &lt;tt&gt;drop&lt;/tt&gt; and the DDL co-ordinator end-of-operation event.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Another alternative would be to mark the final &lt;tt&gt;drop&lt;/tt&gt; event with some special field in the oplog, e.g. {&lt;tt&gt;o2: {finalDropAcrossShards: true&lt;/tt&gt;}}.&lt;/p&gt;</comment>
                            <comment id="4253139" author="kaloian.manassiev" created="Thu, 16 Dec 2021 12:57:50 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=bernard.gorman&quot; class=&quot;user-hover&quot; rel=&quot;bernard.gorman&quot;&gt;bernard.gorman&lt;/a&gt;, the concurrency behaviour of the sharded DDL operations was decided as part of &lt;a href=&quot;https://docs.google.com/document/d/1s8SkcFP8tA6Tt8PLuMptyaHCbYB1ex4U2PaBtLi77gw/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;the scope&lt;/a&gt; for PM-1965 and it explicitly specified the behaviour that that you are describing.&lt;/p&gt;

&lt;p&gt;If this is a problem for change streams we have two ways to address it:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;We make every DDL operation 2-Phase with respect to the critical section (which is really expensive just for change streams).&lt;/li&gt;
	&lt;li&gt;We make change streams listen for the &quot;end&quot; event from the DDL coordinators and only use this for drop/rename/etc events.&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;I slightly prefer that we do (2). What do you think?&lt;/p&gt;</comment>
                            <comment id="4234365" author="max.hirschhorn@10gen.com" created="Tue, 7 Dec 2021 13:57:07 +0000"  >&lt;blockquote&gt;
&lt;p&gt;Max tells me that this is already the case for sharded &lt;tt&gt;renameCollection&lt;/tt&gt;, and that it may be possible to apply a similar approach to &lt;tt&gt;drop&lt;/tt&gt;.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I looked back over the code and saw &amp;#95;shardsvrRenameCollectionParticipant command (1) acquires the critical section to block writes, (2) acquires the critical section to block reads, and (3) locally renames (or renames + drops) the collection. The acquisition of the critical section isn&apos;t synchronized across all shards prior to any of them doing the local rename. This means &lt;tt&gt;rename&lt;/tt&gt; change events face the same issue Bernard described for &lt;tt&gt;drop&lt;/tt&gt; change events.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="1910710">SERVER-61026</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25125"><![CDATA[Query Execution]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 7 Dec 2021 13:57:07 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        2 years, 4 weeks, 1 day ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                            <customfield id="customfield_10857" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>PM-1941</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>bernard.gorman@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            2 years, 4 weeks, 1 day ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-query-execution</customfieldvalue>
            <customfieldvalue>bernard.gorman@mongodb.com</customfieldvalue>
            <customfieldvalue>jordi.serra-torrens@mongodb.com</customfieldvalue>
            <customfieldvalue>kaloian.manassiev@mongodb.com</customfieldvalue>
            <customfieldvalue>max.hirschhorn@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i0def3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hzwp6v:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="5430">Sharding EMEA 2021-12-27</customfieldvalue>
    <customfieldvalue id="5681">Sharding EMEA 2022-01-10</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i0d0kf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>