[SERVER-4000] command to change shard key of a collection Created: 02/Oct/11  Updated: 06/Dec/22  Resolved: 30/Jul/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.0.0

Type: New Feature Priority: Major - P3
Reporter: Dwight Merriman Assignee: [DO NOT USE] Backlog - Sharding NYC
Resolution: Done Votes: 31
Labels: sharding-lifecycle
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-9845 Command to unshard a collection Closed
is duplicated by SERVER-30856 Create Easy Way to Change Shard Key Closed
Related
related to SERVER-4246 allow resharding by more-specific sha... Closed
related to SERVER-14813 Upsert and Shard is tightly coupled, ... Closed
is related to SERVER-16264 Allow unsharding a collection when al... Closed
Assigned Teams:
Sharding NYC
Participants:

 Description   

Changing shard keys is fundamentally very expensive, but a helper to do this would be useful. The main thing needed would be to do the operation with good parallelism.

first cut might require the source collection be read only during the operation.

might do something like

  • measure what the new distribution would be like by looking at a sampled set of records from the originating collection
  • presplit based on statistics above
  • cluster wide copy of data from src to dest collection
  • build the index(es) for dest after the copy to make things as fast as possible

i suppose this is just a better version of cloneCollection which we'll want anyway.



 Comments   
Comment by Garaudy Etienne [ 30/Jul/21 ]

We launched Resharding in MongoDB 5.0. So closing this ticket as complete. 

Comment by Jon Hyman [ 02/Feb/16 ]

It would also be helpful to be able to extend a shard key for increased cardinality. We are running into an issue where we shard on

{a:1}

and have an index on

{a:1, b:1}

and after two years we're starting to see some jumbo chunks. It would be nice to be able to extend the shard key to

{a:1, b:1}

and have the balancer now be able to split chunks on the added cardinality.

EDIT: I see https://jira.mongodb.org/browse/SERVER-4246 exists for this. I'll vote, thanks.

Comment by Mainak Ghosh [ 22/Dec/15 ]

Hello,

I am 4th year PhD student in UIUC working with Prof. Indranil Gupta. We have worked on this problem (wrote some code and published a paper). You can find the details of the solution in this link http://dprg.cs.uiuc.edu/docs/ICAC2015/Conference.pdf. We are currently in the process of porting the code to the new Mongo version as the original code was written v 2.2. Let me know if the solution is of interest to you and we can chat about it.

Thanks and Regards,
Mainak Ghosh.

Comment by Vincent [ 21/Nov/14 ]

@Anne, done => https://jira.mongodb.org/browse/SERVER-16264

Comment by Anne Moroney [ 20/Nov/14 ]

@Vincent, it might be a good idea since this ticket may possibly eventually get implemented in the full version.

Comment by Vincent [ 20/Nov/14 ]

@Anne Nope I didn't, I thought this would be enough

Comment by Anne Moroney [ 20/Nov/14 ]

@Vincent, that idea of 'unshard-if-one-shard' sounds good. Did you make a ticket for it?

Comment by Zhenyu Li [ 25/Aug/14 ]

I wish there is tool to change the shard key and re-distribute a large amount of data (in TB range). I know you guys can figure this out

Comment by Vincent [ 25/Jul/14 ]

At least there should be a way to unshard a collection when all chunks are on the same shard. This way, to change the shard key you'd simply migrate all the chunks to a single shard, unshard, reshard and rebalance. Not ideal, but could fit some use cases.

Comment by Eliot Horowitz (Inactive) [ 24/Apr/12 ]

Sadly not that simple.

Lets say you have 100 shards, and you want to change shard key from (a) to (b).
If those keys aren't related, then 99% of the data has to be moved.

So, question is how to do you move data while maintaining state and keeping data live.

More later if your curious...

Comment by Mike Hobbs [ 23/Apr/12 ]

Regarding my earlier comment from Mar 13:

I'm sure there are subtleties to the implementation of chunks that I'm not fully appreciating, but if there was a special move operation, though, that could move individual records from one shard to another, it would facilitate the migration of records from one key to another. Perhaps such a move operation could be the first step in solving this problem?

Again, I am ignorant about the implementation of chunks, so forgive me if such an idea is naive.

Comment by Eliot Horowitz (Inactive) [ 14/Mar/12 ]

You still have to handle data changes.
If I change the shard key from (a) to (b) the documents in each chunk will be totally different.
The meta data changes aren't so bad - its the actual in flight data that's hard.

Comment by Mike Hobbs [ 13/Mar/12 ]

One option that doesn't require write locks, triggers, or 2X storage capacity:

The sharding config can have 2 shard configurations per collection - a previous configuration and a current configuration. When a collection is re-keyed, the previous configuration is frozen so that no more splits or migrations are performed based on the previous key. The collection is then re-split and moved about based on the new key. As the data is re-distributed, the previous configuration is not updated - it remains frozen.

When a collection has multiple shard configurations, the mongos processes would distribute operations out to several shards based on both the previous and current configurations. A record will exist in its old shard if it has not yet moved - or it could potentially exist in a new shard if it has been moved. (There is an issue here when non-multi updates, upserts, and inserts do not contain both the previous and the new shard key)

When all chunks are migrated based on the new key, the previous configuration is removed.

Comment by Eliot Horowitz (Inactive) [ 12/Mar/12 ]

For making a key more granular we should probably add a new case as that's a lot easier to do since it requires no data movement.

Comment by Mike Hobbs [ 12/Mar/12 ]

A first pass at providing this functionality might be to allow a shard key to be refined. That is, additional fields could be added to an existing key, which would allow more granularity along the existing keys.

We have sometimes discovered that our collections grow in unexpected ways and that a shard key no longer fits in a 64MB chunk. We can currently increase the max chunk size, but ideally, we'd like to redefine the shard key to make it more granular.

Comment by John Crenshaw [ 12/Mar/12 ]

Not needed very often, but on rare occasion it can become really important. I think the most likely use of this may be to fix a collection that was sharded on a linear key (like a MongoId) and needs to be changed to shard against something else (like a hash of that key).

IMO supporting writes is probably vital. Sharding isn't likely to be used in environments where it would be OK for writes to fail for a prolonged period of time. If triggers are done first you could easily ensure that new data is copied over. The process might be:

  • Create a new sharded destination collection
  • Guess at distribution and presplit
  • Add insert/update/delete triggers on the source collection to clone any changes to the destination collection
  • Copy the data
  • Add the indexes to the destination collection

Also, wondering about some stuff at the end:

  • Sanity checks? (verify that record counts match, possibly other checks?)
  • Rename the collections dest>>src? (lock needed to make this atomic?)
  • Remove the old source collection or not?
Generated at Thu Feb 08 03:04:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.