[SERVER-2545] multi data center support via location aware sharding Created: 13/Feb/11  Updated: 28/Sep/16  Resolved: 15/Jun/12

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: None
Fix Version/s: 2.1.2

Type: Improvement Priority: Major - P3
Reporter: Scott Hernandez (Inactive) Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 14
Labels: rn
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Documented
is documented by DOCS-9045 multi data center support via locatio... Closed
Duplicate
is duplicated by SERVER-3359 sharding - tagging - use for shard se... Closed
Related
Participants:

 Description   

A core feature of MongoDB is the ability to perform atomic operations – i.e. single document transactions. However, this then implies a need for a primary/secondary architecture for replicas; for a given shard, all writes must be sent to the current primary. In multi data center systems, the primary may not be in the client's data center.

This jira proposes allowing the user to specify some weight or affinity for (super)chunk ranges to help the balancer decide which shard has which chunks. Different shards would then (when healthy) keep their primary at the appropriate data center.

This would be important when trying to build regional shards for multi-datacenter clusters where user/regional data should be kept in the regional shard(s). It would also allow the user to indicate that contiguous chunk ranges should be on the same shard(s).

This would depend on the ability to make the shard/replicas datacenter aware. Without both you can't create shards that can be used in a region to do local writes to that region.



 Comments   
Comment by auto [ 15/Jun/12 ]

Author:

{u'date': u'2012-06-15T15:12:20-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}

Message: SERVER-2545 - sanity check tag ranges
Branch: master
https://github.com/mongodb/mongo/commit/8cb54a63e154cec11f1b07ab124e762c3d0d9dff

Comment by auto [ 15/Jun/12 ]

Author:

{u'date': u'2012-06-15T12:22:54-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}

Message: SERVER-2545 - start of a few shell helpers
Branch: master
https://github.com/mongodb/mongo/commit/297b940ee79fbba5fa2dd9c15a3af4eb9e87a6d0

Comment by auto [ 15/Jun/12 ]

Author:

{u'date': u'2012-06-15T10:02:50-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}

Message: SERVER-2545 - fix test
Branch: master
https://github.com/mongodb/mongo/commit/a289a4809b140c5a24f60b97872d3e68602536a9

Comment by auto [ 15/Jun/12 ]

Author:

{u'date': u'2012-06-15T10:02:32-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}

Message: SERVER-2545 - randomize order that we balance tags
Branch: master
https://github.com/mongodb/mongo/commit/14f33c4ec4c5785726195224dfd993140de5c158

Comment by auto [ 15/Jun/12 ]

Author:

{u'date': u'2012-06-15T09:38:48-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}

Message: SERVER-2545 - balancer fully tag aware
Branch: master
https://github.com/mongodb/mongo/commit/f9ec6b8c2da49235cc61d2211b25e0a7500d0d17

Comment by auto [ 15/Jun/12 ]

Author:

{u'date': u'2012-06-14T18:57:35-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}

Message: SERVER-2545 BalancerPolicy fully tag aware
Branch: master
https://github.com/mongodb/mongo/commit/f975fac3887e405eb41452a69ad5be9b1a74ae90

Comment by auto [ 15/Jun/12 ]

Author:

{u'date': u'2012-06-14T12:40:26-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}

Message: SERVER-2545 - draining tag aware
Branch: master
https://github.com/mongodb/mongo/commit/76e8d89b5ca79fdb6033c5663cf14a9cf60f7484

Comment by auto [ 13/Jun/12 ]

Author:

{u'date': u'2012-06-13T15:24:54-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}

Message: clean balancer policy a bit in prep of SERVER-2545
Branch: master
https://github.com/mongodb/mongo/commit/60c041e473b59e9451da79ac32b9f1cd2c686b28

Comment by Alon Horev [ 03/May/12 ]

we need 'explicit' sharding in order to divide our data between data centers in a way that minimizes the queries between them. we're going to turn off the balancer and move the chunks by our selves but i think a simple solution is possible:
to every document in the chunks collection, add a field stating the list of shards it is allowed to split into. (chunk X can be split/moved to other shards in the same data center). an even prettier solution is tagging: chunk X should stay in the "east", where every shard can be tagged "east" or "west").

Comment by Robert Sindall [ 17/Apr/12 ]

What do we have to do to use this feature? i.e. the syntax?

Comment by Eliot Horowitz (Inactive) [ 03/Mar/12 ]

2.1.1 has slipped a bit - updating date.
but this is still planned for 2.1.1

Comment by Ryan Breen [ 02/Mar/12 ]

Is this still on target for 2.1.1, due today?

Comment by Yuri Finkelstein [ 22/Feb/12 ]

Whether this will be implemented as described above or differently - the important piece here is that mongo needs to break out of this concept of single master for writes. This is very limiting for customers and is creating asymmetry in the deployment making it difficult to maintain a proper disaster recovery compliance and guarantee latency SLA for writes when writes originate in a different data center. The importance of this item should be elevated!

Comment by Grégoire Seux [ 06/Feb/12 ]

I am really happy that you are starting to plan this, it will be really useful for my company.
Thanks !

Comment by Chris Westin [ 07/Dec/11 ]

@Robert Sindall: That's another twist I wasn't thinking of. I was thinking of countries that have laws regarding the length of time for which personal data can be retained. Regarding "data that cannot exist," what happens if a general query hits the cluster, and material from the shard you suggest satisfies the query and would be returned in a data center in another country? Laws that assume the physical location of the bits is relevant to where the data is served are going to make this even more complicated.

Comment by Robert Sindall [ 07/Dec/11 ]

@Chris Westin: Thanks for responding. I am not 100% clear on your last point? Regarding your first point, data that cannot exist outside of a geographic region due to data protection laws would have to be in a shard which is in a replica set which spans racks but not DC's and we would need to add an off-site backup process (which conforms to local data protection laws). Would that work\be possible?

Comment by Chris Westin [ 06/Dec/11 ]

@Robert Sindall: I don't think this will help you with differing data protection laws. Because the shards would have replicas in the other locations (for data safety and failover), you would probably have to have all data subject to the most stringent laws in effect in any of the locations.

Comment by Robert Sindall [ 06/Dec/11 ]

This would be a very useful feature especially due to different data protection laws in different regions

Comment by Eliot Horowitz (Inactive) [ 24/Nov/11 ]

It is most likely going to be in 2.2

Comment by Grégoire Seux [ 24/Nov/11 ]

Is there any news on this (really awesome) feature ?

Comment by Grégoire Seux [ 19/Oct/11 ]

Same issue here, we have several datacenters in Europe and US. Several clusters management is a pain and induces a higher complexity in applicative layer. If database chunk balancing could be limited to a given tag, it would be really a very good step.

Comment by Hugh Watkins [ 07/Sep/11 ]

We have this issue currently. We are creating a national application that is hosted at multiple datacenters, MongoDB is sharded and replicated to all datacenters. For example let's say Atlanta, Denver and New York as the data centers.

The majority of the time 99+% a user would be served from the same datacenter based on geography, ie for subscribers in the Mid-West or West would be served from Denver, CO. We would like writes for those users to be local to Denver, by the use of a tag or shard key.

The problem is without setting up separate mongodb databases I don't have a way to make sure the master is located in the same datacenter as the subscribers information, so updates could get routed to a different datacenter. Setting up different databases makes my application overly complex.

Comment by Eliot Horowitz (Inactive) [ 13/Feb/11 ]

Not sure this is exactly what we'll do, but we have a number of ideas pending.

Generated at Thu Feb 08 03:00:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.