<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:38:33 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-56170] Investigate why some oplog entries generated during tenant migration for timeseries bucket collections require stricter than normal idempotency guarantees</title>
                <link>https://jira.mongodb.org/browse/SERVER-56170</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;During the course of &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-55501&quot; title=&quot;Avoid element-wise iteration and copy when appending to an object in doc_diff::applyDiff&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-55501&quot;&gt;&lt;del&gt;SERVER-55501&lt;/del&gt;&lt;/a&gt;, we added an optimization for oplog diff application for certain scenarios where we know about the structure of the pre-image and the diff, and can guarantee that fields which are inserted by the diff do not already exist in the pre-image.&lt;/p&gt;

&lt;p&gt;In the case of updates that happen as a result of timeseries inserts through the normal BucketCatalog machinery, we know that the resulting oplog entry which is applied on the primary should satisfy these conditions. Additionally, we know that the corresponding entry when applied on a secondary in steady state should also qualify.&lt;/p&gt;

&lt;p&gt;What we found is that tenant migrations throw some wrenches in the work here. In particular, it looks like we need to disable the optimization on the primary even when the write goes through the bucket catalog, if the write comes from a tenant migration replaying the oplog. After talking it through a bit, &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=lingzhi.deng&quot; class=&quot;user-hover&quot; rel=&quot;lingzhi.deng&quot;&gt;lingzhi.deng&lt;/a&gt; and &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=dan.larkin-york&quot; class=&quot;user-hover&quot; rel=&quot;dan.larkin-york&quot;&gt;dan.larkin-york&lt;/a&gt; came to the conclusion that the secondary should in theory be able to apply any entries generated from the primary blindly with the optimization, without checking if they resulted from a tenant migration - however, this didn&apos;t appear to be the case. Some still resulted in field duplication, and thus required the check for tenant migration source.&lt;/p&gt;

&lt;p&gt;It remains unclear why we sometimes generate these entries which require the strict idempotency guarantees which normally are not required for writes coming through the BucketCatalog. It may be that something is going wrong at the BucketCatalog layer, or it may be that tenant migrations are doing something unexpected, or any number of other things. The goal of this ticket is simply to understand what&apos;s going on here.&lt;/p&gt;</description>
                <environment></environment>
        <key id="1682845">SERVER-56170</key>
            <summary>Investigate why some oplog entries generated during tenant migration for timeseries bucket collections require stricter than normal idempotency guarantees</summary>
                <type id="3" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14718&amp;avatarType=issuetype">Task</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="dan.larkin-york@mongodb.com">Dan Larkin-York</assignee>
                                    <reporter username="dan.larkin-york@mongodb.com">Dan Larkin-York</reporter>
                        <labels>
                    </labels>
                <created>Mon, 19 Apr 2021 15:51:47 +0000</created>
                <updated>Mon, 17 May 2021 21:41:40 +0000</updated>
                            <resolved>Mon, 17 May 2021 19:50:41 +0000</resolved>
                                                                                        <votes>0</votes>
                                    <watches>4</watches>
                                                                                                                <comments>
                            <comment id="3778557" author="geert.bosch" created="Mon, 17 May 2021 21:41:40 +0000"  >&lt;p&gt;So, to confirm my understanding, tenant migration depends on oplog application being idempotent, even on the primary in normal operation? If so, it seems it seems reasonable to include tenant migration among the conditions to not apply our optimization.&lt;/p&gt;</comment>
                            <comment id="3778160" author="JIRAUSER1258161" created="Mon, 17 May 2021 19:50:16 +0000"  >&lt;p&gt;After digging through code and discussing the expected semantics of the v:2 doc_diff application, the current behavior seems to be expected. I&apos;ll summarize below.&lt;/p&gt;

&lt;p&gt;During a tenant migration, the recipient primary performs an initial sync procedure with the donor primary. First it gets a dump of the collection, then it catches up on any changes since the dump by replaying a portion of the donor&apos;s oplog. The tricky bit here is that the portion of the oplog that it replays may contain some operations that were already reflected in dump. That is to be expected, but it interacts in a funny way with the v:2 doc_diff format.&lt;/p&gt;

&lt;p&gt;The v:2 doc diff format has a crucial property for idempotency: that when you reapply any suffix of the diff chain in order, you&apos;ll end up at the same result. That is, if you have two diffs x and y, you can end up getting a chain like:&lt;/p&gt;
&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;apply(A, x) -&amp;gt; B&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;apply(B, y) -&amp;gt; C&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;apply(C, x) -&amp;gt; D&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;apply(D, y) -&amp;gt; C&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;
&lt;p&gt;Now, since this oplog replay is happening on the recipient primary, any update that isn&apos;t a no-op will generate a new oplog entry. So in this chain, each application would result in a new oplog entry, even though we end up back at the same state (C) as we were at a previous step. And subseuqently, each of these oplog entries will be applied on the recipient secondary.&lt;/p&gt;

&lt;p&gt;Where this matters for the case of the optimization introduced in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-55501&quot; title=&quot;Avoid element-wise iteration and copy when appending to an object in doc_diff::applyDiff&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-55501&quot;&gt;&lt;del&gt;SERVER-55501&lt;/del&gt;&lt;/a&gt;, is in the case of field insertion. The v:2 doc_diff format treats the insertion of a field that already exists as a reinsertion (or move-to-end). Thus, when we insert a new measurement for a timeseries bucket document that already exists, it&apos;s reinserted, and we generate a new oplog entry instead of treating it as a no-op. And thus we need to disable the optimization and use full idempotency guarantees for any oplog entries generated by tenant migrations.&lt;/p&gt;

&lt;p&gt;Importantly, we need to take note if any future projects introduce a similar mechanism to tenant migration where a primary replays a portion of an oplog that overlaps with operations it has already applied, and add exceptions to the optimization for these as well.&lt;/p&gt;

&lt;p&gt;In doc diff v:3, we should be able to introduce a new type of insert operation (insert2 or something) which does not perform reinsertion in case the field already exists. That would render such replay operations for timeseries collections no-ops, and would not generate new oplog entries.&lt;/p&gt;</comment>
                            <comment id="3725330" author="JIRAUSER1258161" created="Mon, 19 Apr 2021 15:52:35 +0000"  >&lt;p&gt;Assigning to storage execution. Replication can assist if needed.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                            <outwardlinks description="depends on">
                                        <issuelink>
            <issuekey id="1658073">SERVER-55501</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 17 May 2021 21:41:40 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        2 years, 38 weeks, 2 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[<s><a href='https://jira.mongodb.org/browse/SERVER-55501'>SERVER-55501</a></s>]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>geert.bosch@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            2 years, 38 weeks, 2 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>dan.larkin-york@mongodb.com</customfieldvalue>
            <customfieldvalue>geert.bosch@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hz5853:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hyt2rr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="4878">Execution Team 2021-05-31</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hz4ue7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>