[SERVER-2545] multi data center support via location aware sharding Created: 13/Feb/11 Updated: 28/Sep/16 Resolved: 15/Jun/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 2.1.2 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Scott Hernandez (Inactive) | Assignee: | Eliot Horowitz (Inactive) |
| Resolution: | Done | Votes: | 14 |
| Labels: | rn | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
A core feature of MongoDB is the ability to perform atomic operations – i.e. single document transactions. However, this then implies a need for a primary/secondary architecture for replicas; for a given shard, all writes must be sent to the current primary. In multi data center systems, the primary may not be in the client's data center. This jira proposes allowing the user to specify some weight or affinity for (super)chunk ranges to help the balancer decide which shard has which chunks. Different shards would then (when healthy) keep their primary at the appropriate data center. This would be important when trying to build regional shards for multi-datacenter clusters where user/regional data should be kept in the regional shard(s). It would also allow the user to indicate that contiguous chunk ranges should be on the same shard(s). This would depend on the ability to make the shard/replicas datacenter aware. Without both you can't create shards that can be used in a region to do local writes to that region. |
| Comments |
| Comment by auto [ 15/Jun/12 ] |
|
Author: {u'date': u'2012-06-15T15:12:20-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}Message: |
| Comment by auto [ 15/Jun/12 ] |
|
Author: {u'date': u'2012-06-15T12:22:54-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}Message: |
| Comment by auto [ 15/Jun/12 ] |
|
Author: {u'date': u'2012-06-15T10:02:50-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}Message: |
| Comment by auto [ 15/Jun/12 ] |
|
Author: {u'date': u'2012-06-15T10:02:32-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}Message: |
| Comment by auto [ 15/Jun/12 ] |
|
Author: {u'date': u'2012-06-15T09:38:48-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}Message: |
| Comment by auto [ 15/Jun/12 ] |
|
Author: {u'date': u'2012-06-14T18:57:35-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}Message: |
| Comment by auto [ 15/Jun/12 ] |
|
Author: {u'date': u'2012-06-14T12:40:26-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}Message: |
| Comment by auto [ 13/Jun/12 ] |
|
Author: {u'date': u'2012-06-13T15:24:54-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}Message: clean balancer policy a bit in prep of |
| Comment by Alon Horev [ 03/May/12 ] |
|
we need 'explicit' sharding in order to divide our data between data centers in a way that minimizes the queries between them. we're going to turn off the balancer and move the chunks by our selves but i think a simple solution is possible: |
| Comment by Robert Sindall [ 17/Apr/12 ] |
|
What do we have to do to use this feature? i.e. the syntax? |
| Comment by Eliot Horowitz (Inactive) [ 03/Mar/12 ] |
|
2.1.1 has slipped a bit - updating date. |
| Comment by Ryan Breen [ 02/Mar/12 ] |
|
Is this still on target for 2.1.1, due today? |
| Comment by Yuri Finkelstein [ 22/Feb/12 ] |
|
Whether this will be implemented as described above or differently - the important piece here is that mongo needs to break out of this concept of single master for writes. This is very limiting for customers and is creating asymmetry in the deployment making it difficult to maintain a proper disaster recovery compliance and guarantee latency SLA for writes when writes originate in a different data center. The importance of this item should be elevated! |
| Comment by Grégoire Seux [ 06/Feb/12 ] |
|
I am really happy that you are starting to plan this, it will be really useful for my company. |
| Comment by Chris Westin [ 07/Dec/11 ] |
|
@Robert Sindall: That's another twist I wasn't thinking of. I was thinking of countries that have laws regarding the length of time for which personal data can be retained. Regarding "data that cannot exist," what happens if a general query hits the cluster, and material from the shard you suggest satisfies the query and would be returned in a data center in another country? Laws that assume the physical location of the bits is relevant to where the data is served are going to make this even more complicated. |
| Comment by Robert Sindall [ 07/Dec/11 ] |
|
@Chris Westin: Thanks for responding. I am not 100% clear on your last point? Regarding your first point, data that cannot exist outside of a geographic region due to data protection laws would have to be in a shard which is in a replica set which spans racks but not DC's and we would need to add an off-site backup process (which conforms to local data protection laws). Would that work\be possible? |
| Comment by Chris Westin [ 06/Dec/11 ] |
|
@Robert Sindall: I don't think this will help you with differing data protection laws. Because the shards would have replicas in the other locations (for data safety and failover), you would probably have to have all data subject to the most stringent laws in effect in any of the locations. |
| Comment by Robert Sindall [ 06/Dec/11 ] |
|
This would be a very useful feature especially due to different data protection laws in different regions |
| Comment by Eliot Horowitz (Inactive) [ 24/Nov/11 ] |
|
It is most likely going to be in 2.2 |
| Comment by Grégoire Seux [ 24/Nov/11 ] |
|
Is there any news on this (really awesome) feature ? |
| Comment by Grégoire Seux [ 19/Oct/11 ] |
|
Same issue here, we have several datacenters in Europe and US. Several clusters management is a pain and induces a higher complexity in applicative layer. If database chunk balancing could be limited to a given tag, it would be really a very good step. |
| Comment by Hugh Watkins [ 07/Sep/11 ] |
|
We have this issue currently. We are creating a national application that is hosted at multiple datacenters, MongoDB is sharded and replicated to all datacenters. For example let's say Atlanta, Denver and New York as the data centers. The majority of the time 99+% a user would be served from the same datacenter based on geography, ie for subscribers in the Mid-West or West would be served from Denver, CO. We would like writes for those users to be local to Denver, by the use of a tag or shard key. The problem is without setting up separate mongodb databases I don't have a way to make sure the master is located in the same datacenter as the subscribers information, so updates could get routed to a different datacenter. Setting up different databases makes my application overly complex. |
| Comment by Eliot Horowitz (Inactive) [ 13/Feb/11 ] |
|
Not sure this is exactly what we'll do, but we have a number of ideas pending. |