[SERVER-14813] Upsert and Shard is tightly coupled, and there is no clear documentation on that Created: 07/Aug/14 Updated: 25/Aug/14 Resolved: 25/Aug/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Zhenyu Li | Assignee: | Thomas Rueckstiess |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
To perform "upsert" operation on a sharded MongoDB cluster, the query MUST exactly match the shard key. For example:
The above update does not work if the shard is key is for example
Is this the designed behavior? If so, why cannot I not find the detailed example or warning for this? Since the shard kay cannot be modified, this is causing huge problems. It just completely broke our migration from a stand-alone server to the sharded cluster. |
| Comments |
| Comment by Thomas Rueckstiess [ 25/Aug/14 ] | |
|
Hi Zhenyu, I'm glad you were able to resolve the issue. I've linked this ticket to Regards, | |
| Comment by Zhenyu Li [ 16/Aug/14 ] | |
|
Thomas, Thanks a lot for your detailed reply. The reason I was trying to use _id in my shard key is that I don't want one chunk to grow to large, and I want Mongo to split that chunk further. I ended up doing the following to fix this issue: Dump all the data (more than 200GB) to the disk, and restore it to a new database, and re-shard it without using the _id key. After that, I was able to perform upsert without problems. If you can, it would be great to have the following: Have a tool to change the shard key somehow. I know it requires re-distributing all the data, but it would be very valuable to have this tool in the future. Thanks, Zhenyu | |
| Comment by Thomas Rueckstiess [ 07/Aug/14 ] | |
|
Hi Zhenyu, The behavior you're seeing is expected. An update that does not specify {multi:true} has to contain the full shard key or the _id field in order to be routed to the correct shard. This is documented on our page about updates under the Sharded Collections section. However, in the case of an upsert, specifying {multi:true} in a sharded collection may still not work, as it would create documents on multiple shards that may end up being orphaned (i.e. not belonging to that shard). In my tests, an update in a sharded collection with {multi:true, upsert:true}, but without fully specifying the shard key in the query part, was still disallowed. Perhaps you can explain what you'd like to achieve with the upsert without specifying the shard key, and what you'd expect to happen if the document can't be found on some of the shards. Understanding the use case better, I may be able to suggest an alternative, or at the least improve our documentation around upsert behavior in sharded collections. One comment about this statement here:
This is almost correct. The shard key must be fully specified in the query part of the upsert. For shard key {a:1, _id:1}, you could for example run the following upsert:
Regards, |