<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 09:06:13 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[KAFKA-366] Parallel bulk writes from sink connector</title>
                <link>https://jira.mongodb.org/browse/KAFKA-366</link>
                <project id="16285" key="KAFKA">Kafka Connector</project>
                    <description>&lt;p&gt;In com.mongodb.kafka.connect.sink.StartedMongoSinkTask#put a collection of records is grouped into batches of writes by namespace (i.e. mongoDB database and collection name). However, this list of distinct batches are then written to MongoDB in serial.&lt;/p&gt;

&lt;p&gt;This means that you will get a large drop in performance if&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;your sink connector consumes from multiple topics&lt;br/&gt;
or&lt;/li&gt;
	&lt;li&gt;you add transforms that split data from one topic into multiple collections&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;My team first noticed this issue during a data rate spike that caused the connector to lag behind by over an hour.&lt;/p&gt;

&lt;p&gt;We should be able to do these bulk writes in parallel with a thread pool (with a configurable pool size) . Since each batch write is to a separate collection, ordering will not be impacted.&lt;/p&gt;</description>
                <environment></environment>
        <key id="2325783">KAFKA-366</key>
            <summary>Parallel bulk writes from sink connector</summary>
                <type id="2" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14711&amp;avatarType=issuetype">New Feature</type>
                                            <priority id="10300" iconUrl="https://jira.mongodb.org/images/icons/priorities/medium.svg">Unknown</priority>
                        <status id="10038" iconUrl="https://jira.mongodb.org/images/icons/subtask.gif" description="">Backlog</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="-1">Unassigned</assignee>
                                    <reporter username="martin.andersson@kambi.com">Martin Andersson</reporter>
                        <labels>
                    </labels>
                <created>Thu, 27 Apr 2023 12:26:32 +0000</created>
                <updated>Mon, 14 Aug 2023 17:13:36 +0000</updated>
                                                            <fixVersion>1.12.0</fixVersion>
                                    <component>Sink</component>
                                        <votes>0</votes>
                                    <watches>3</watches>
                                                                                                                <comments>
                            <comment id="5393532" author="martin.andersson@kambi.com" created="Tue, 2 May 2023 11:58:36 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=robert.walters%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;robert.walters@mongodb.com&quot;&gt;robert.walters@mongodb.com&lt;/a&gt; i had a read and the configuration options mentioned in this article does not address the issues mentioned in the ticket description; &lt;em&gt;Within a task&lt;/em&gt;, &lt;em&gt;if that tasks consumes from multiple topics (or consumed records are mapped to multiple mongoDB namespaces)&lt;/em&gt;, then the &lt;b&gt;max.batch.size&lt;/b&gt; configuration option does not apply. Naturally, records being written to different mongoDB collections are grouped into to separate batches. &lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="5392454" author="robert.walters" created="Mon, 1 May 2023 22:31:03 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=martin.andersson%40kambi.com&quot; class=&quot;user-hover&quot; rel=&quot;martin.andersson@kambi.com&quot;&gt;martin.andersson@kambi.com&lt;/a&gt; please review &lt;a href=&quot;https://www.mongodb.com/developer/products/connectors/tuning-mongodb-kafka-connector/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://www.mongodb.com/developer/products/connectors/tuning-mongodb-kafka-connector/&lt;/a&gt; this blog post under Sink there are some recommendations to improve sink write performance specifically settings tasks.max property.&#160;&#160;&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_21553" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Quarter</customfieldname>
                        <customfieldvalues>
                                        <label>FY24Q3</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr3mmh:0400000950lw</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>