<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:44:37 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-58473] Change stream startup with resumeToken uses COLLSCAN and is slow</title>
                <link>https://jira.mongodb.org/browse/SERVER-58473</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;When resuming a change stream, there are expensive queries issued against the oplog using a `COLLSCAN`. Given that the oplog is naturally sorted, and given that the resume token encodes time info, why does Mongo need to start from the top of the oplog to find where the change event document? Shouldn&apos;t search on this sorted list be `O(log&amp;#40;n))`?&lt;/p&gt;

&lt;p&gt;My understanding is that the oplog is mainly for replication, and there are no indexes on the oplog. However, it does follow a sort order, so it seems Mongo can take advantage of this for faster lookups for the resume token.&lt;/p&gt;

&lt;p&gt;Is this just a limitation of the existing oplog architecture? Could this be supported in the future? Maybe what can help me better understand is: how are change stream `_id`s derived? Is the main reason for this limitation that we don&apos;t have _ids on the oplog? Even if that&apos;s a case, could we first identify the timestamp and then use that to find the respective op?&#160;&lt;/p&gt;
</description>
                <environment></environment>
        <key id="1811738">SERVER-58473</key>
            <summary>Change stream startup with resumeToken uses COLLSCAN and is slow</summary>
                <type id="6" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14720&amp;avatarType=issuetype">Question</type>
                                            <priority id="4" iconUrl="https://jira.mongodb.org/images/icons/priorities/minor.svg">Minor - P4</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13202">Works as Designed</resolution>
                                        <assignee username="bernard.gorman@mongodb.com">Bernard Gorman</assignee>
                                    <reporter username="david@vanta.com">David Zhu</reporter>
                        <labels>
                    </labels>
                <created>Thu, 8 Jul 2021 05:35:16 +0000</created>
                <updated>Fri, 27 Oct 2023 13:52:20 +0000</updated>
                            <resolved>Sun, 8 Aug 2021 03:17:35 +0000</resolved>
                                                                                        <votes>0</votes>
                                    <watches>3</watches>
                                                                                                                <comments>
                            <comment id="3987184" author="bernard.gorman" created="Sun, 8 Aug 2021 03:14:27 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=david%40vanta.com&quot; class=&quot;user-hover&quot; rel=&quot;david@vanta.com&quot;&gt;david@vanta.com&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;What you have described above is, in fact, exactly what change streams does when resuming from either a resume token or via &lt;tt&gt;startAtOperationTime&lt;/tt&gt; - it will jump directly to that starting point in the oplog and beginning scanning forward in time from there. The &lt;tt&gt;COLLSCAN&lt;/tt&gt; simply indicates that the query is not using an index; as you rightly point out, the oplog does not have any indexes in order to avoid write amplification during replication, but it is implicitly sorted by &lt;tt&gt;ts&lt;/tt&gt;. The &lt;tt&gt;COLLSCAN&lt;/tt&gt; does not, however, mean that the scan is starting from the beginning of the oplog.&lt;/p&gt;

&lt;p&gt;You can confirm this for yourself by issuing an &lt;tt&gt;explain&lt;/tt&gt; on a change stream:&lt;/p&gt;

&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;db.coll.explain().aggregate([{$changeStream: {resumeAfter: resume_token}}])&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;

&lt;p&gt;If you examine the output, you will see that the first stage of the pipeline is a &lt;tt&gt;$cursor&lt;/tt&gt; stage with namespace &lt;tt&gt;local.oplog.rs&lt;/tt&gt;. You will further observe a field in this stage which looks something like the following:&lt;/p&gt;

&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;	&quot;direction&quot; : &quot;forward&quot;,&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;	&quot;minTs&quot; : Timestamp(1628391155, 1)&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;

&lt;p&gt;The &lt;tt&gt;minTs&lt;/tt&gt; field indicates that the time-based optimization is being used for this query, and that we have skipped directly to this timestamp before continuing our forward scan. Any query on the oplog which includes a &lt;tt&gt;$gt&lt;/tt&gt;, &lt;tt&gt;$gte&lt;/tt&gt;, or equality predicate in the query will take advantage of this optimization. There is also a similar &lt;tt&gt;maxTs&lt;/tt&gt; optimization for &lt;tt&gt;$lt&lt;/tt&gt; and &lt;tt&gt;$lte&lt;/tt&gt; predicates.&lt;/p&gt;

&lt;p&gt;Of course, if you are starting a change stream from a point in the very distant past of the oplog, then the stream can only skip a relatively small percentage of the total documents before scanning through the remainder. To guard against this, you should ensure that your application retrieves the latest resume token from the driver on every batch. Even if no new results are returned and the batch is empty, the resume token will continue to advance to reflect the latest point in the oplog, meaning that your resume point should always be near the end.&lt;/p&gt;

&lt;p&gt;Hope this helps!&lt;/p&gt;

&lt;p&gt;Best regards,&lt;br/&gt;
Bernard&lt;/p&gt;</comment>
                            <comment id="3930306" author="joseph.dani" created="Tue, 13 Jul 2021 13:02:28 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=david%40vanta.com&quot; class=&quot;user-hover&quot; rel=&quot;david@vanta.com&quot;&gt;david@vanta.com&lt;/a&gt;&#160;- The Data Architecture (DARCH) team handles questions for our internal data platform.&#160; I believe this question is meant for the Product team, so moving over to SERVER Jira project.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_13552" key="com.go2group.jira.plugin.crm:crm_generic_field">
                        <customfieldname>Case</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[[5002K00000wWpkLQAS]]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 13 Jul 2021 13:02:28 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        2 years, 26 weeks, 4 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>luke.bonanomi@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            2 years, 26 weeks, 4 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>bernard.gorman@mongodb.com</customfieldvalue>
            <customfieldvalue>david@vanta.com</customfieldvalue>
            <customfieldvalue>joseph.dani@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzr21b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hzbdkf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="4713">QE 2021-08-09</customfieldvalue>
    <customfieldvalue id="4715">QE 2021-08-23</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                    <customfieldvalue><![CDATA[edwin.zhou@mongodb.com]]></customfieldvalue>
    

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzqoaf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>