[SERVER-3088] Combine Replica Sets and Shards to a single system, using RAID5 style parity Created: 12/May/11  Updated: 06/Dec/22

Status: Open
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: features we're not sure of

Type: New Feature Priority: Major - P3
Reporter: Colin Davis Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Replication
Participants:

 Description   

As discussed at http://groups.google.com/group/mongodb-user/browse_thread/thread/c18a9d181de2e5f6 , it would be tremendously useful for us to be able to use a RAID-5/6 style distributed parity system to combine ReplicaSets with Shards.

The current system of sharding and using replica sets requires that we have 2x servers as we have shards.
While this does gives us is effectively RAID0+1 level protection.
Every bit is replicated to another server that is ensuring it's safe.

It would dramatically reduce the number of servers that large deployments need, if we could instead use RAID-5/6 style protection.
The RAID5 system would store and distribute a parity block, that allows the other blocks to be reconstructed, using the remaining data, and XOR.

While I'm sure you know how RAID works, it might be helpful to quickly review the exact mechanism-
http://www.scottklarr.com/topic/23/how-raid-5-really-works/

So in this system, with 5 shards, we'd use six servers.
Each shard would store 1/5th of the data, plus a parity block.

When any one of the systems went down, we could reconstruct it, by using the data remaining on the other 5.

You can increase redundancy, by increasing the number of parity blocks.

RAID6 uses the same system, but two parity blocks, to increase the overall reliability of the system, and ensure it can handle two
failures at the same time.

If Mongo were to support such a system, companies could deploy dramatically fewer servers, while maintaining a very high level of reliability and failover.



 Comments   
Comment by Colin Davis [ 14/May/11 ]

I apologize for not understanding-

Do you mean each Physical Hardware server would have one shard server, and two replica sets for shard servers on other machines?

From my understanding of Mongo Sharding and replication, as described on the web and at http://bit.ly/dm6G9v
Mongo replication requires that each shard consists of 2+ replica set members; The primary, and at least one secondary.

If you had 5 Physical Hardware servers, you could run multiple Mongod instances on each server, giving you redundancy, but it would still require the same number of processes running in total, just spread across fewer machines, right?
That would increase the RAM requirements on each server, versus using explicit parity that is distributed..

Do continue our RAID analogy, If I'm understanding you correctly, you seem to be suggesting partitioning the my drives, and then using RAID 0+1 to mirror partitions.

I still need 2x-1 the server resources that I would with a RAID-5 style setup, no?

I apologize again if I'm missing something.
-CPD

Comment by Eliot Horowitz (Inactive) [ 14/May/11 ]

You can do this manually now.
So if you have 5 servers, you could have 5 shards.
Each node will handle 1 master and 2 secondaries.

Generated at Thu Feb 08 03:02:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.