[SERVER-509] Add option to continue with bulk insert on duplicate key/object id Created: 30/Dec/09 Updated: 12/Jul/16 Resolved: 23/May/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance |
| Affects Version/s: | None |
| Fix Version/s: | 1.9.1 |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Marc Boeker | Assignee: | Kyle Banker |
| Resolution: | Done | Votes: | 14 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
hi guys, maybe it is possible to add an option that makes it possible to continue with processing the bulk insert once a duplicate key/object id has occured? my usecase: the object id of each chunk is a 12 byte hash of the chunk contents. therefore i'm using the md4 algorithm. (this makes it faster than having another unique index especially for the chunk hash.) if i'm inserting a 100MB file (1600 chunks of 64KB), duplicate chunks won't be saved. this is my poormans method of deduplication for e.g. in pymongo: thanks in advance, |
| Comments |
| Comment by Antoine Girbal [ 25/May/11 ] |
|
quick notes:
|
| Comment by auto [ 23/May/11 ] |
|
Author: {u'login': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}Message: Add InsertOption_KeepGoing to keep going after error on bulk insert. |
| Comment by auto [ 23/May/11 ] |
|
Author: {u'login': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}Message: minor refactor to prep for |
| Comment by Benjamin Darfler [ 18/Apr/11 ] |
|
If possible it would be nice to return the items that were not inserted or otherwise give feedback as to which ones failed. |
| Comment by Knut Forkalsrud [ 23/Mar/11 ] |
|
In my use case I could take advantage of being the only client inserting into the collection. This use case may be common enough that it might make sense to support it is some library form, maybe even the driver. |
| Comment by ofer fort [ 22/Mar/11 ] |
|
this is something we'd also love to have, as it would reduce our calls to insert dramatically. |
| Comment by Eliot Horowitz (Inactive) [ 21/Jan/11 ] |
|
To do this, all driver APIs will need to change. |
| Comment by Dwight Merriman [ 14/Mar/10 ] |
|
yes this makes sense given the chunks are pretty big though, i think you will find singleton inserts to be very fast if you do not call getlasterror after each insert. |