[SERVER-4004] Bulk Upsert Created: 03/Oct/11  Updated: 15/May/17  Resolved: 08/Jan/14

Status: Closed
Project: Core Server
Component/s: Performance, Usability, Write Ops
Affects Version/s: 2.0.0
Fix Version/s: 2.5.5

Type: Improvement Priority: Major - P3
Reporter: Glen Holcomb Assignee: Greg Studer
Resolution: Done Votes: 37
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-2172 Support Batching Mixed Operations Aga... Backlog
Duplicate
duplicates SERVER-9038 New write operation method for insert... Closed
is duplicated by SERVER-7563 Support for Batched Upsert Closed
Related
related to SERVER-2172 Support Batching Mixed Operations Aga... Backlog
Participants:

 Description   

When presented with a mix of new documents and existing documents which may have been modified it would be nice to have a bulk upsert. This would potentially save a great deal of time. The current work-around is to partition the documents into existing and new and then insert the new and update each existing document.

When there are a significant number of existing documents this is unnecessarily time consuming.



 Comments   
Comment by Alberto Lerner [ 31/Oct/13 ]

eliot Is this any different than updates with write commands? Any changes in the shell?

Comment by Kevin J. Rice [ 25/Feb/13 ]

[ Adding same comment here as to SERVER-2172 ]

I'm issuing thousands of update statements a second. I would like to send them as a list of updates rather than do them one by one, with the latency involved in that.

Currently there is support for batch inserts. This feature may tie-in with that functionality.

Playing devil's advocate to my own argument, I can see a possible complication. I have a sharded, replicated collection. So, let's say I send in a list of updates (each of a single separate document indexed by _id). The router (mongos) would have to split the updates to separate lists to go to their respective shards.

Other than that, it seems a straightforward performance increaser.

Re: failures, handle it the same as batch inserts, insert all possible records/documents and return data on which ones failed, or whatever is the easiest functionality to implement and I'll cope with the downsides.

Comment by Chris Scribner [ 20/Dec/12 ]

+1 - This would be helpful for writing scripts to upgrade data in large tables. Specifically, in cases where a simple update script won't suffice, and we can't use server-side javascript due to sharding and availability concerns.

Comment by David K. Storrs [ 31/Oct/12 ]

Thanks Eliot, I'll run the benchmarks again. As you guessed, the prior time I was still on 2.0.

Comment by Eliot Horowitz (Inactive) [ 31/Oct/12 ]

david - you should try findAndModify in 2.2
Should be a lot faster.

Comment by David K. Storrs [ 31/Oct/12 ]

@Travis Krick: Another reason not to use FindAndModify is that it's ~100x slower than update. (According to benchmarks I ran; standard issues with other people's benchmarks apply, of course.)

Comment by Travis Krick [ 31/Oct/12 ]

My application would also make good use of a bulk/batch upsert. Agreed, less than desirable code to work around this. FindAndModify isn't sufficient because I want to modify the matching documents each in its own different way.

Comment by David K. Storrs [ 24/Jul/12 ]

Please add this. The amount of work and database thrashing that we have to go through to work around it is substantial.

Generated at Thu Feb 08 03:04:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.