<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 02:54:20 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-509] Add option to continue with bulk insert on duplicate key/object id</title>
                <link>https://jira.mongodb.org/browse/SERVER-509</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;hi guys,&lt;/p&gt;

&lt;p&gt;maybe it is possible to add an option that makes it possible to continue with processing the bulk insert once a duplicate key/object id has occured?&lt;/p&gt;

&lt;p&gt;my usecase:&lt;br/&gt;
i&apos;m building a gridfs clone with data deduplication based on a hash of the chunk data and file revisions. my collecrtion with the chunks looks like this: &lt;br/&gt;
{_id: &amp;lt;objectid&amp;gt;, data: &amp;lt;bin-data&amp;gt;}&lt;/p&gt;

&lt;p&gt;the object id of each chunk is a 12 byte hash of the chunk contents. therefore i&apos;m using the md4 algorithm. (this makes it faster than having another unique index especially for the chunk hash.)&lt;/p&gt;

&lt;p&gt;if i&apos;m inserting a 100MB file (1600 chunks of 64KB), duplicate chunks won&apos;t be saved. this is my poormans method of deduplication &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.mongodb.org/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;br/&gt;
i&apos;ve thought it would be much faster to collect 1000 chunks and do a bulk insert. but if the DB won&apos;t process the following chunks after a duplicate object id was found, my method doesn&apos;t work.&lt;br/&gt;
i would be very happy to have an option to continue on duplicate chunks.&lt;/p&gt;

&lt;p&gt;for e.g. in pymongo:&lt;br/&gt;
db.collection.insert(&lt;span class=&quot;error&quot;&gt;&amp;#91;my_doc_list&amp;#93;&lt;/span&gt;, skip_duplicates=True)&lt;/p&gt;

&lt;p&gt;thanks in advance,&lt;br/&gt;
marc&lt;/p&gt;</description>
                <environment></environment>
        <key id="11055">SERVER-509</key>
            <summary>Add option to continue with bulk insert on duplicate key/object id</summary>
                <type id="2" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14711&amp;avatarType=issuetype">New Feature</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="kbanker">Kyle Banker</assignee>
                                    <reporter username="marc">Marc Boeker</reporter>
                        <labels>
                    </labels>
                <created>Wed, 30 Dec 2009 12:05:29 +0000</created>
                <updated>Tue, 12 Jul 2016 00:28:45 +0000</updated>
                            <resolved>Mon, 23 May 2011 23:01:06 +0000</resolved>
                                                    <fixVersion>1.9.1</fixVersion>
                                    <component>Index Maintenance</component>
                                        <votes>14</votes>
                                    <watches>18</watches>
                                                                                                                <comments>
                            <comment id="34390" author="antoine" created="Wed, 25 May 2011 18:21:56 +0000"  >&lt;p&gt;quick notes:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;how can you use a md4 to deduplicate chunks?&lt;br/&gt;
It sounds like if the md4 matches, you dont insert the chunk, but what happens on hash collision when data is not exactly the same?&lt;/li&gt;
	&lt;li&gt;considering that your chunks are rather large, you probably wont see much of improvement for bulk insert vs fire-and-forget single inserts.&lt;br/&gt;
Often times the bottleneck will be disk, and single inserts can use multiple connections / server threads for processing.&lt;br/&gt;
Let us know if you see much of a difference&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="34034" author="auto" created="Mon, 23 May 2011 22:56:24 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{u&apos;login&apos;: u&apos;RedBeard0531&apos;, u&apos;name&apos;: u&apos;Mathias Stearn&apos;, u&apos;email&apos;: u&apos;mathias@10gen.com&apos;}
&lt;p&gt;Message: Add InsertOption_KeepGoing to keep going after error on bulk insert. &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-509&quot; title=&quot;Add option to continue with bulk insert on duplicate key/object id&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-509&quot;&gt;&lt;del&gt;SERVER-509&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
Branch: master&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/b690e237fd7055ad1da8950882c62b4fab82baee&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/b690e237fd7055ad1da8950882c62b4fab82baee&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="34033" author="auto" created="Mon, 23 May 2011 22:56:22 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{u&apos;login&apos;: u&apos;RedBeard0531&apos;, u&apos;name&apos;: u&apos;Mathias Stearn&apos;, u&apos;email&apos;: u&apos;mathias@10gen.com&apos;}
&lt;p&gt;Message: minor refactor to prep for &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-509&quot; title=&quot;Add option to continue with bulk insert on duplicate key/object id&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-509&quot;&gt;&lt;del&gt;SERVER-509&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
Branch: master&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/0e28f89602e84f6ea6009cf5d5d91da675c1d199&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/0e28f89602e84f6ea6009cf5d5d91da675c1d199&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="29219" author="bdarfler" created="Mon, 18 Apr 2011 17:38:50 +0000"  >&lt;p&gt;If possible it would be nice to return the items that were not inserted or otherwise give feedback as to which ones failed.&lt;/p&gt;</comment>
                            <comment id="26534" author="knut" created="Wed, 23 Mar 2011 02:07:29 +0000"  >&lt;p&gt;In my use case I could take advantage of being the only client inserting into the collection.&lt;br/&gt;
That way I could insert at will and after each batch I check the size of the collection.  When the&lt;br/&gt;
size is smaller than expected I can easily calculate which element in my batch caused the failure and discard/retry accordingly.&lt;/p&gt;

&lt;p&gt;This use case may be common enough that it might make sense to support it is some library form, maybe even the driver.&lt;/p&gt;</comment>
                            <comment id="26529" author="oferfort" created="Tue, 22 Mar 2011 23:55:24 +0000"  >&lt;p&gt;this is something we&apos;d also love to have, as it would reduce our calls to insert dramatically.&lt;/p&gt;</comment>
                            <comment id="22582" author="eliot" created="Fri, 21 Jan 2011 16:40:36 +0000"  >&lt;p&gt;To do this, all driver APIs will need to change.&lt;br/&gt;
Should do at beginning of next cycle so they have time to change.&lt;/p&gt;</comment>
                            <comment id="12971" author="dwight_10gen" created="Sun, 14 Mar 2010 12:52:15 +0000"  >&lt;p&gt;yes this makes sense&lt;/p&gt;

&lt;p&gt;given the chunks are pretty big though, i think you will find singleton inserts to be very fast if you do not call getlasterror after each insert.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                                                <inwardlinks description="is depended on by">
                                        <issuelink>
            <issuekey id="11270">TOOLS-72</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Sun, 14 Mar 2010 12:52:15 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        12 years, 39 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ramon.fernandez@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            12 years, 39 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>antoine</customfieldvalue>
            <customfieldvalue>auto</customfieldvalue>
            <customfieldvalue>bdarfler</customfieldvalue>
            <customfieldvalue>dwight@mongodb.com</customfieldvalue>
            <customfieldvalue>eliot</customfieldvalue>
            <customfieldvalue>knut</customfieldvalue>
            <customfieldvalue>kbanker</customfieldvalue>
            <customfieldvalue>marc</customfieldvalue>
            <customfieldvalue>oferfort</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrpte7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hrimlz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>22315</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|ht0t7j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>