[SERVER-14813] Upsert and Shard is tightly coupled, and there is no clear documentation on that Created: 07/Aug/14  Updated: 25/Aug/14  Resolved: 25/Aug/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Zhenyu Li Assignee: Thomas Rueckstiess
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-4000 command to change shard key of a coll... Closed
Participants:

 Description   

To perform "upsert" operation on a sharded MongoDB cluster, the query MUST exactly match the shard key.

For example:

update( {a:1, b:2}, { $setOnInsert { c : 3, d : 4}}, { upsert: true})

The above update does not work if the shard is key is for example

{a: 1, _id: 1}. 

Is this the designed behavior? If so, why cannot I not find the detailed example or warning for this? Since the shard kay cannot be modified, this is causing huge problems. It just completely broke our migration from a stand-alone server to the sharded cluster.



 Comments   
Comment by Thomas Rueckstiess [ 25/Aug/14 ]

Hi Zhenyu,

I'm glad you were able to resolve the issue.

I've linked this ticket to SERVER-4000, which is a request to have a command to change the shard key. Feel free to watch this ticket and vote for it to increase the priority.

Regards,
Thomas

Comment by Zhenyu Li [ 16/Aug/14 ]

Thomas,

Thanks a lot for your detailed reply. The reason I was trying to use _id in my shard key is that I don't want one chunk to grow to large, and I want Mongo to split that chunk further.
The reason that I cannot use _id in my query is that all my documents have a randomly generated _id field. So, I cannot just get the _id field from the information I have on that particular document.

I ended up doing the following to fix this issue:

Dump all the data (more than 200GB) to the disk, and restore it to a new database, and re-shard it without using the _id key. After that, I was able to perform upsert without problems.

If you can, it would be great to have the following:

Have a tool to change the shard key somehow. I know it requires re-distributing all the data, but it would be very valuable to have this tool in the future.
I understand that you guys have the documentation here: http://docs.mongodb.org/manual/reference/method/db.collection.update/#sharded-collections
I just wish that there could be a warning and a link to the above reference on this page: http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/
After all, most of the people don't dig into the reference section as often as the tutorial section.

Thanks,

Zhenyu

Comment by Thomas Rueckstiess [ 07/Aug/14 ]

Hi Zhenyu,

The behavior you're seeing is expected. An update that does not specify {multi:true} has to contain the full shard key or the _id field in order to be routed to the correct shard. This is documented on our page about updates under the Sharded Collections section.

However, in the case of an upsert, specifying {multi:true} in a sharded collection may still not work, as it would create documents on multiple shards that may end up being orphaned (i.e. not belonging to that shard). In my tests, an update in a sharded collection with {multi:true, upsert:true}, but without fully specifying the shard key in the query part, was still disallowed.

Perhaps you can explain what you'd like to achieve with the upsert without specifying the shard key, and what you'd expect to happen if the document can't be found on some of the shards. Understanding the use case better, I may be able to suggest an alternative, or at the least improve our documentation around upsert behavior in sharded collections.

One comment about this statement here:

the query MUST exactly match the shard key.

This is almost correct. The shard key must be fully specified in the query part of the upsert. For shard key {a:1, _id:1}, you could for example run the following upsert:

update( {a:1, b:2, _id: 3}, { $setOnInsert: { c: 3, d: 4}}, { upsert: true })

Regards,
Thomas

Generated at Thu Feb 08 03:36:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.