<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 09:05:42 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[KAFKA-151] Out of Memory Issue with source connector in certain scenorio</title>
                <link>https://jira.mongodb.org/browse/KAFKA-151</link>
                <project id="16285" key="KAFKA">Kafka Connector</project>
                    <description>&lt;p&gt;&lt;ins&gt;Setup:&lt;/ins&gt;&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;I have a document with slightly greater size (&amp;gt;8MB) and in the document i have an array say for example studentids which has 100k-150K ids . &quot;studentids&quot; : &lt;span class=&quot;error&quot;&gt;&amp;#91;NumberLong(&amp;quot;906019125703444&amp;quot;),NumberLong(&amp;quot;326026735808036&amp;quot;), ...&amp;#93;&lt;/span&gt; etc&lt;/li&gt;
	&lt;li&gt;In connector configuration I have&#160;&quot;change.stream.full.document&quot;: &quot;updateLookup&quot; since we need full document for every update&lt;/li&gt;
	&lt;li&gt;I have a small utility which updates the document in a loop for around 1000 times.&#160;&lt;/li&gt;
	&lt;li&gt;The above exercise results in an OOM exception which is attached. Even though the error happens at com.mongodb.kafka.connect.source.MongoSourceTask.poll(MongoSourceTask.java:192) we have a hunch the issue is manifested and the issue is with how connector handles this type of data.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;If we end up ignoring or scrapping the particular information in the document with pipeline command example:&#160;&quot;pipeline&quot;: &quot;[ { $project: &lt;/p&gt;
{ \&quot;fullDocument.studentids\&quot;:0}
&lt;p&gt; } ]&quot; . we don&apos;t see the issue anymore.&lt;/p&gt;

&lt;p&gt;Can you please confirm the issue and provide us with valid configuration to handle this kind of data. Thanks in advance.&lt;/p&gt;</description>
                <environment>Kafka Connector: 1.2.0&lt;br/&gt;
MongoDb version: 3.6.17</environment>
        <key id="1458823">KAFKA-151</key>
            <summary>Out of Memory Issue with source connector in certain scenorio</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13202">Works as Designed</resolution>
                                        <assignee username="ross@mongodb.com">Ross Lawley</assignee>
                                    <reporter username="sabari.mgn@gmail.com">Sabari Gandhi</reporter>
                        <labels>
                    </labels>
                <created>Tue, 1 Sep 2020 13:54:23 +0000</created>
                <updated>Fri, 27 Oct 2023 11:54:15 +0000</updated>
                            <resolved>Wed, 9 Dec 2020 15:57:30 +0000</resolved>
                                    <version>1.2.0</version>
                                                                        <votes>2</votes>
                                    <watches>4</watches>
                                                                                                                <comments>
                            <comment id="3522984" author="ross@10gen.com" created="Wed, 9 Dec 2020 15:57:17 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=sabari.mgn%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;sabari.mgn@gmail.com&quot;&gt;sabari.mgn@gmail.com&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;I was able to set the &lt;tt&gt;poll.max.batch.size&lt;/tt&gt; to 100 and then I wouldn&apos;t see the OOM exception.&lt;/p&gt;

&lt;p&gt;However, with messages this large I did see this exception:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The message is 9896884 bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;You may wish to look at changing the document structure  to reduce the size of the messages and / or changing the serialization format to use either Raw Bson Bytes or Avro schema (if the document structure is normalized).&lt;/p&gt;

&lt;p&gt;I&apos;m going to close this ticket as Works as Designed, because the OOM could also be mitigated by providing more HEAP to the JVM processes.&lt;/p&gt;

&lt;p&gt;I hope that helps,&lt;/p&gt;

&lt;p&gt;Ross&lt;/p&gt;</comment>
                            <comment id="3508833" author="sabari.mgn@gmail.com" created="Mon, 30 Nov 2020 18:07:23 +0000"  >&lt;p&gt;Hi Ross, Please let me know when you have any updates or additional information in reproducing the test case. Thanks&lt;/p&gt;</comment>
                            <comment id="3481728" author="sabari.mgn@gmail.com" created="Mon, 9 Nov 2020 16:07:03 +0000"  >&lt;p&gt;Thanks, Ross - let me know if you need additional information in reproducing the issue.&lt;/p&gt;</comment>
                            <comment id="3481233" author="ross@10gen.com" created="Mon, 9 Nov 2020 11:24:09 +0000"  >&lt;p&gt;Moving to in progress - will review the test case and see if I can debug further.&lt;/p&gt;</comment>
                            <comment id="3480151" author="sabari.mgn@gmail.com" created="Fri, 6 Nov 2020 20:08:27 +0000"  >&lt;p&gt;This is the mongoDB support post that I&apos;ve created &lt;a href=&quot;https://developer.mongodb.com/community/forums/t/out-of-memory-issue-with-source-connector-in-certain-scenario/11330?u=sabari_gandhi1&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://developer.mongodb.com/community/forums/t/out-of-memory-issue-with-source-connector-in-certain-scenario/11330?u=sabari_gandhi1&lt;/a&gt;&#160;. I was not able to attach the files to setup and reproduce the issue so keeping this ticket updated with the required information.&lt;/p&gt;</comment>
                            <comment id="3480142" author="sabari.mgn@gmail.com" created="Fri, 6 Nov 2020 20:02:12 +0000"  >&lt;p&gt;Thanks for your comments. Please see below the steps to reproduce the issue. These are not the exact data but I was able to reproduce the issue following the steps&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Followed docker example&#160;&lt;a href=&quot;https://docs.mongodb.com/kafka-connector/master/kafka-docker-example/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://docs.mongodb.com/kafka-connector/master/kafka-docker-example/&lt;/a&gt;&#160;except I don&#8217;t have any connectors (Confluent Datagen Connector, MongoDB Kafka Sink Connector and MongoDB Kafka Source Connector that&#8217;s created in the script).&lt;/li&gt;
	&lt;li&gt;Attached inserted JSON data (sample_data.json) :&#160;&lt;tt&gt;docker exec -i mongo1 sh -c &apos;mongoimport -c investigate1 -d test&apos; &amp;lt; sample_data.json&lt;/tt&gt;&#160;. As you can see the data/document has two huge arrays studentids and groupids.&lt;/li&gt;
	&lt;li&gt;Register the source connector using the attached configuration (source_connect.sh) using the REST API.&lt;/li&gt;
	&lt;li&gt;As you can see with above I need the full document for every update and I use the attached small java utility to update the document 1000 times (ImportData.java)&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;Regarding JVM Setting. I allocated 4G by adding this in docker-compose using the below config KAFKA_HEAP_OPTS: &#8220;-Xmx4G&#8221;. As mentioned I see the issue in prod environment where we use containers with 6G and allocated heap size of 5G. We did try lowering the batch size as low as 300 but still end up hitting the issue in some scenario.&lt;/p&gt;

&lt;p&gt;Please let me know for additional information or questions regarding steps.&lt;/p&gt;</comment>
                            <comment id="3394254" author="ross@10gen.com" created="Tue, 15 Sep 2020 13:12:48 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=sabari.mgn%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;sabari.mgn@gmail.com&quot;&gt;sabari.mgn@gmail.com&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Thank you for reaching out. For future reference as this sounds like a support issue, I wanted to give you some resources to get this questioned answered more quickly:&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;our MongoDB support portal, located at &lt;a href=&quot;https://support.mongodb.com/welcome&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;support.mongodb.com&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;our MongoDB community portal, located &lt;a href=&quot;https://developer.mongodb.com/community/forums/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;here&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;If you are an Atlas customer, there is free support offered 24/7 in the lower right hand corner of the UI.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Just in case you have already opened a support case and are not receiving sufficient help, please let me know and I can facilitate escalating your issue.&lt;/p&gt;


&lt;p&gt;With regards to the OOM error - the line in question converts the Change stream document into a raw json string.  The polling mechanism in Source connectors batch up changes before publishing them to the topic.  This can be configured by setting &lt;tt&gt;poll.max.batch.size&lt;/tt&gt; which by default will try to batch 1,000 source records and publish them to the topic.  Reducing this max batch size should prevent OOM errors.&lt;/p&gt;

&lt;p&gt;With out error logs,  configuration examples and JVM configuration I can&apos;t provide more insight here.&lt;/p&gt;

&lt;p&gt;Ross&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="286104" name="ImportData.java" size="815" author="sabari.mgn@gmail.com" created="Fri, 6 Nov 2020 20:00:02 +0000"/>
                            <attachment id="276370" name="OOM.log" size="3083" author="sabari.mgn@gmail.com" created="Tue, 1 Sep 2020 13:54:20 +0000"/>
                            <attachment id="286103" name="sample_data.json" size="9346496" author="sabari.mgn@gmail.com" created="Fri, 6 Nov 2020 19:59:22 +0000"/>
                            <attachment id="286106" name="source_connect.sh" size="1332" author="sabari.mgn@gmail.com" created="Fri, 6 Nov 2020 20:01:11 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr69m7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>