<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 09:05:56 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[KAFKA-247] Recreate change stream from the point of failure for event &gt; 16 MB</title>
                <link>https://jira.mongodb.org/browse/KAFKA-247</link>
                <project id="16285" key="KAFKA">Kafka Connector</project>
                    <description>&lt;p&gt;When a change event in the change stream exceeds the 16MB limit, existing change stream is closed with an exception and new change stream is opened. In a system with a higher update load this will likely miss the change events in the time it takes to start a new change stream. I have 2 proposal for improvement.&lt;/p&gt;

&lt;h3&gt;&lt;a name=&quot;Solution%231&quot;&gt;&lt;/a&gt;Solution #1&lt;/h3&gt;
&lt;p&gt;The error message of the exception contains the resumeToken of the failed event. Use the &quot;ChangeStream.startAfter(&amp;lt;resumeToken&amp;gt;)&quot; to start the new stream just after the failed event, leading to zero loss of events.&lt;/p&gt;

&lt;p&gt;Example error message&lt;/p&gt;
&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;BSONObj size: 19001449 (0x121F069) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: &quot;826115AEE9000000012B022C0100296E5A1004D148D6B22E8F49B3A65DAE80A4683566463C5F6964003C316663726B36326F6D30303030303030000004&quot; }&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;


&lt;h3&gt;&lt;a name=&quot;Solution%232&quot;&gt;&lt;/a&gt;Solution #2&lt;/h3&gt;
&lt;p&gt;Increment the &quot;clusterTime&quot; (introduced in v4.0) available in the MongoCommandException, by 1 ordinal and use it with &quot;ChangeStream.startAtOperationTime(&amp;lt;BsonTimestamp&amp;gt;)&quot;&lt;/p&gt;

&lt;p&gt;For sharded cluster, it is possible that multiple events may have same cluster time and this approach can skip few good events with same timestamp as the bad one.&lt;/p&gt;

</description>
                <environment></environment>
        <key id="1850751">KAFKA-247</key>
            <summary>Recreate change stream from the point of failure for event &gt; 16 MB</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="10300" iconUrl="https://jira.mongodb.org/images/icons/priorities/medium.svg">Unknown</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="2">Won&apos;t Fix</resolution>
                                        <assignee username="-1">Unassigned</assignee>
                                    <reporter username="dhruvangmakadia1@gmail.com">Dhruvang Makadia</reporter>
                        <labels>
                            <label>external-user</label>
                    </labels>
                <created>Thu, 12 Aug 2021 23:45:41 +0000</created>
                <updated>Fri, 1 Sep 2023 14:39:45 +0000</updated>
                            <resolved>Tue, 25 Jul 2023 13:40:45 +0000</resolved>
                                                                    <component>Source</component>
                                        <votes>0</votes>
                                    <watches>5</watches>
                                                                                                                <comments>
                            <comment id="5589470" author="robert.walters" created="Tue, 25 Jul 2023 13:40:45 +0000"  >&lt;p&gt;Handling of large messages will be implemented with &lt;a href=&quot;https://jira.mongodb.org/browse/KAFKA-381&quot; title=&quot;Support change stream split large events&quot; class=&quot;issue-link&quot; data-issue-key=&quot;KAFKA-381&quot;&gt;KAFKA-381&lt;/a&gt;.&#160;&lt;/p&gt;</comment>
                            <comment id="5589371" author="ross@10gen.com" created="Tue, 25 Jul 2023 13:11:47 +0000"  >&lt;p&gt;I think this ticket should be closed as &quot;Won&apos;t fix&quot; as we cannot resume a change stream from the point of failure.&lt;/p&gt;

&lt;p&gt;Recommend directing users to &lt;a href=&quot;https://jira.mongodb.org/browse/KAFKA-381&quot; title=&quot;Support change stream split large events&quot; class=&quot;issue-link&quot; data-issue-key=&quot;KAFKA-381&quot;&gt;KAFKA-381&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="4003436" author="ross@10gen.com" created="Tue, 17 Aug 2021 08:00:00 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=dhruvangmakadia1%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;dhruvangmakadia1@gmail.com&quot;&gt;dhruvangmakadia1@gmail.com&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;The last seen resume token is stored as the offset. So resiliency is there for other events as the connector will continue after the last seen event. It&apos;s just this exception is non resumable as the last consumed event occurs before the too large event and the change stream if restarted at the last seen (processed) event would continue to see the same error.&lt;/p&gt;

&lt;p&gt;So the challenge is to capture the message too large error and process it differently to other errors (essentially skip that event). However, it will depend on users configuration as missing that event will result in data loss. The only way to ensure no data loss would be to restart and go through the copy data process.&lt;/p&gt;

&lt;p&gt;Ross&lt;/p&gt;</comment>
                            <comment id="4002040" author="JIRAUSER1261663" created="Mon, 16 Aug 2021 17:24:08 +0000"  >&lt;p&gt;Hi Ross Lawley,&lt;/p&gt;

&lt;p&gt;Although I did the investigation and filed the ticket just for large event exception, I wonder if similar improvement can be made for other exceptions resulting in change stream exception as well. In an ideal world, we would like to have no data loss between kafka and updates to mongo.&lt;/p&gt;</comment>
                            <comment id="4001426" author="ross@10gen.com" created="Mon, 16 Aug 2021 14:55:01 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=dhruvangmakadia1%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;dhruvangmakadia1@gmail.com&quot;&gt;dhruvangmakadia1@gmail.com&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Thanks for the ticket. This is something we can look into improving.  &lt;/p&gt;

&lt;p&gt;Unfortunately, until &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-55062&quot; title=&quot;Change stream events can exceed 16MB with no workaround&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-55062&quot;&gt;&lt;del&gt;SERVER-55062&lt;/del&gt;&lt;/a&gt; is implemented a large change stream document will result in some data loss.&lt;/p&gt;

&lt;p&gt;All the best,&lt;/p&gt;

&lt;p&gt;Ross&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="11021">
                    <name>Design</name>
                                                                <inwardlinks description="design is described in">
                                        <issuelink>
            <issuekey id="2400047">KAFKA-381</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="1644511">SERVER-55062</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="2400047">KAFKA-381</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr3mmh:0400000950x</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>