[SERVER-4004] Bulk Upsert Created: 03/Oct/11 Updated: 15/May/17 Resolved: 08/Jan/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance, Usability, Write Ops |
| Affects Version/s: | 2.0.0 |
| Fix Version/s: | 2.5.5 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Glen Holcomb | Assignee: | Greg Studer |
| Resolution: | Done | Votes: | 37 |
| Labels: | performance | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
When presented with a mix of new documents and existing documents which may have been modified it would be nice to have a bulk upsert. This would potentially save a great deal of time. The current work-around is to partition the documents into existing and new and then insert the new and update each existing document. When there are a significant number of existing documents this is unnecessarily time consuming. |
| Comments |
| Comment by Alberto Lerner [ 31/Oct/13 ] |
|
eliot Is this any different than updates with write commands? Any changes in the shell? |
| Comment by Kevin J. Rice [ 25/Feb/13 ] |
|
[ Adding same comment here as to SERVER-2172 ] I'm issuing thousands of update statements a second. I would like to send them as a list of updates rather than do them one by one, with the latency involved in that. Currently there is support for batch inserts. This feature may tie-in with that functionality. Playing devil's advocate to my own argument, I can see a possible complication. I have a sharded, replicated collection. So, let's say I send in a list of updates (each of a single separate document indexed by _id). The router (mongos) would have to split the updates to separate lists to go to their respective shards. Other than that, it seems a straightforward performance increaser. Re: failures, handle it the same as batch inserts, insert all possible records/documents and return data on which ones failed, or whatever is the easiest functionality to implement and I'll cope with the downsides. |
| Comment by Chris Scribner [ 20/Dec/12 ] |
|
+1 - This would be helpful for writing scripts to upgrade data in large tables. Specifically, in cases where a simple update script won't suffice, and we can't use server-side javascript due to sharding and availability concerns. |
| Comment by David K. Storrs [ 31/Oct/12 ] |
|
Thanks Eliot, I'll run the benchmarks again. As you guessed, the prior time I was still on 2.0. |
| Comment by Eliot Horowitz (Inactive) [ 31/Oct/12 ] |
|
david - you should try findAndModify in 2.2 |
| Comment by David K. Storrs [ 31/Oct/12 ] |
|
@Travis Krick: Another reason not to use FindAndModify is that it's ~100x slower than update. (According to benchmarks I ran; standard issues with other people's benchmarks apply, of course.) |
| Comment by Travis Krick [ 31/Oct/12 ] |
|
My application would also make good use of a bulk/batch upsert. Agreed, less than desirable code to work around this. FindAndModify isn't sufficient because I want to modify the matching documents each in its own different way. |
| Comment by David K. Storrs [ 24/Jul/12 ] |
|
Please add this. The amount of work and database thrashing that we have to go through to work around it is substantial. |