[SERVER-4000] command to change shard key of a collection Created: 02/Oct/11 Updated: 06/Dec/22 Resolved: 30/Jul/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.0 |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Dwight Merriman | Assignee: | [DO NOT USE] Backlog - Sharding NYC |
| Resolution: | Done | Votes: | 31 |
| Labels: | sharding-lifecycle | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Assigned Teams: |
Sharding NYC
|
||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Description |
|
Changing shard keys is fundamentally very expensive, but a helper to do this would be useful. The main thing needed would be to do the operation with good parallelism. first cut might require the source collection be read only during the operation. might do something like
i suppose this is just a better version of cloneCollection which we'll want anyway. |
| Comments |
| Comment by Garaudy Etienne [ 30/Jul/21 ] |
|
We launched Resharding in MongoDB 5.0. So closing this ticket as complete. |
| Comment by Jon Hyman [ 02/Feb/16 ] |
|
It would also be helpful to be able to extend a shard key for increased cardinality. We are running into an issue where we shard on {a:1}and have an index on {a:1, b:1}and after two years we're starting to see some jumbo chunks. It would be nice to be able to extend the shard key to {a:1, b:1}and have the balancer now be able to split chunks on the added cardinality. EDIT: I see https://jira.mongodb.org/browse/SERVER-4246 exists for this. I'll vote, thanks. |
| Comment by Mainak Ghosh [ 22/Dec/15 ] |
|
Hello, I am 4th year PhD student in UIUC working with Prof. Indranil Gupta. We have worked on this problem (wrote some code and published a paper). You can find the details of the solution in this link http://dprg.cs.uiuc.edu/docs/ICAC2015/Conference.pdf. We are currently in the process of porting the code to the new Mongo version as the original code was written v 2.2. Let me know if the solution is of interest to you and we can chat about it. Thanks and Regards, |
| Comment by Vincent [ 21/Nov/14 ] |
|
@Anne, done => https://jira.mongodb.org/browse/SERVER-16264 |
| Comment by Anne Moroney [ 20/Nov/14 ] |
|
@Vincent, it might be a good idea since this ticket may possibly eventually get implemented in the full version. |
| Comment by Vincent [ 20/Nov/14 ] |
|
@Anne Nope I didn't, I thought this would be enough |
| Comment by Anne Moroney [ 20/Nov/14 ] |
|
@Vincent, that idea of 'unshard-if-one-shard' sounds good. Did you make a ticket for it? |
| Comment by Zhenyu Li [ 25/Aug/14 ] |
|
I wish there is tool to change the shard key and re-distribute a large amount of data (in TB range). I know you guys can figure this out |
| Comment by Vincent [ 25/Jul/14 ] |
|
At least there should be a way to unshard a collection when all chunks are on the same shard. This way, to change the shard key you'd simply migrate all the chunks to a single shard, unshard, reshard and rebalance. Not ideal, but could fit some use cases. |
| Comment by Eliot Horowitz (Inactive) [ 24/Apr/12 ] |
|
Sadly not that simple. Lets say you have 100 shards, and you want to change shard key from (a) to (b). So, question is how to do you move data while maintaining state and keeping data live. More later if your curious... |
| Comment by Mike Hobbs [ 23/Apr/12 ] |
|
Regarding my earlier comment from Mar 13: I'm sure there are subtleties to the implementation of chunks that I'm not fully appreciating, but if there was a special move operation, though, that could move individual records from one shard to another, it would facilitate the migration of records from one key to another. Perhaps such a move operation could be the first step in solving this problem? Again, I am ignorant about the implementation of chunks, so forgive me if such an idea is naive. |
| Comment by Eliot Horowitz (Inactive) [ 14/Mar/12 ] |
|
You still have to handle data changes. |
| Comment by Mike Hobbs [ 13/Mar/12 ] |
|
One option that doesn't require write locks, triggers, or 2X storage capacity: The sharding config can have 2 shard configurations per collection - a previous configuration and a current configuration. When a collection is re-keyed, the previous configuration is frozen so that no more splits or migrations are performed based on the previous key. The collection is then re-split and moved about based on the new key. As the data is re-distributed, the previous configuration is not updated - it remains frozen. When a collection has multiple shard configurations, the mongos processes would distribute operations out to several shards based on both the previous and current configurations. A record will exist in its old shard if it has not yet moved - or it could potentially exist in a new shard if it has been moved. (There is an issue here when non-multi updates, upserts, and inserts do not contain both the previous and the new shard key) When all chunks are migrated based on the new key, the previous configuration is removed. |
| Comment by Eliot Horowitz (Inactive) [ 12/Mar/12 ] |
|
For making a key more granular we should probably add a new case as that's a lot easier to do since it requires no data movement. |
| Comment by Mike Hobbs [ 12/Mar/12 ] |
|
A first pass at providing this functionality might be to allow a shard key to be refined. That is, additional fields could be added to an existing key, which would allow more granularity along the existing keys. We have sometimes discovered that our collections grow in unexpected ways and that a shard key no longer fits in a 64MB chunk. We can currently increase the max chunk size, but ideally, we'd like to redefine the shard key to make it more granular. |
| Comment by John Crenshaw [ 12/Mar/12 ] |
|
Not needed very often, but on rare occasion it can become really important. I think the most likely use of this may be to fix a collection that was sharded on a linear key (like a MongoId) and needs to be changed to shard against something else (like a hash of that key). IMO supporting writes is probably vital. Sharding isn't likely to be used in environments where it would be OK for writes to fail for a prolonged period of time. If triggers are done first you could easily ensure that new data is copied over. The process might be:
Also, wondering about some stuff at the end:
|