[SERVER-7487] Support overwriting existing documents in bulk insert operation Created: 26/Oct/12 Updated: 18/Jul/14 Resolved: 18/Jul/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Write Ops |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Pawel | Assignee: | Unassigned |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: |
| Description |
|
I have a relatively simple use case, IMHO. The system receives events that eventually end up as objects in a Mongo collection. The system has an "event cache", which is used to accumulate single event data over some time. The event cache utilizes some primary key system to identify updates to the same events. This primary key is what is then used as document ID for Mongo. Once the event cache deems a number of events to be "completed", the events are flushed out into Mongo, and are removed from the cache. This "flushing" is done using bulk insert. Once in a blue moon, however, there is a problem, and the cache can not purge the events that have been flushed out. As a result, when the next flush occurs, the bulk insert fails because there is a document ID collision. Now, I've set the "ContinueOnLastError" to true, and I hope that this will prevent exceptions (I'm using Java driver) in the bulk inserts. However, I would prefer that in case of a collision, the documents are overwritten, instead of preferring the one that's already in the collection. Would it not be reasonable to add a feature to overwrite documents during bulk insert? This is not an upsert, as the document is completely overwritten, and not updated. |
| Comments |
| Comment by Pawel [ 18/Jul/14 ] | |||||||
|
OK, I misunderstood Bulk. I can have multiple find() operations on Bulk, which I then can use to do single upsert per each. So, it would be :
In which case what I was asking about originally has been covered by Bulk API. | |||||||
| Comment by Scott Hernandez (Inactive) [ 18/Jul/14 ] | |||||||
|
With the bulk api you can construct a bulk save, as you describe it, so it is already there. If you changed all your inserts into updates with upsert true I think you can do what you want with the bulk update calls. See http://docs.mongodb.org/manual/reference/method/Bulk.find.upsert/#bulk-find-upsert If your concern is purely performance then that can be addresses separately, and should be a new issue. | |||||||
| Comment by Pawel [ 18/Jul/14 ] | |||||||
|
Thank you. I'm using JAVA driver, but I see that its latest version also supports Bulk. However, it only helps my case a little bit. What I'm trying to do is to insert a large (up to 10K) number of objects in a single insert operation. The objects are fully independent, but have _id field controlled by the application. Using Bulk API I would do:
Now, say, that one or more objects in many has _id that clashes with the _id of documents in the collection. Those inserts will fail. What I want ideally - is to give it an option to do "save" instead of "insert". Bulk update/upsert won't help because it requires a query that would select the objects, and the update to be applicable to all these selected objects. Where Bulk API does help me, is that with it, I can at least pinpoint the inserts that have failed, and then run update() on them individually (The DBCollection.insert(Object...) method in previous Java drivers didn't indicate which objects failed to insert). I don't think requesting to support what is in essence, a bulk "save", is reasonable anyway. | |||||||
| Comment by Scott Hernandez (Inactive) [ 18/Jul/14 ] | |||||||
|
Can you please provide an example in psudo-code and docs showing what you want and what document states should exist after each operation to clarify your needs? Since you filed this we have also made some changes to our bulk api and features you might want to read about if you haven't yet: http://docs.mongodb.org/manual/reference/method/js-bulk/ Things can be re-opened once it is clear what you need isn't already supported. | |||||||
| Comment by Pawel [ 18/Jul/14 ] | |||||||
|
You have completely ignored everything I said, and did not leave me any chance to respond, or re-open this issue. First, I was talking about bulk insert. Supposedly, it allows for pipelining, which makes it significantly faster. Of course, if there is no actual pipelining, that this is a moot point. | |||||||
| Comment by Scott Hernandez (Inactive) [ 18/Jul/14 ] | |||||||
|
You can specify an update w/upsert to replace the whole document. This is what save does in fact.
|