[SERVER-19884] All config server crash Created: 12/Aug/15  Updated: 12/Aug/15  Resolved: 12/Aug/15

Status: Closed
Project: Core Server
Component/s: Admin
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: patrick wong Assignee: Ramon Fernandez Marina
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-1448 Host sharding config data on a replic... Closed
Participants:

 Description   

If all 3 config servers crash, the entire cluster will become non-functional even all data or data servers are still functional. And restoration of whole cluster become the only way to recover the cluster. For some terabyte mongodb, the recovery may take a few days and it cannot be affordable for some production system.

The data server is easily protected by adding extra member to each replication set. However, we only have 3 config servers and don't have ways to store the delta change of config between each config server backup

Request :

Once we restore an outdated config server, we need a way / new feature / a tool to synchronize the data between the outdated config server and the existing data servers or recover the metadata loss of config from existing data servers

OR

We have a way to restore config server to point that the existing data server reaches to

  • The possibility of failure of all config server should be minimal. However, for some new dbengine, SERVER-18316-like issue may occur again. If it happen during site loss or unsafe building power off. All config server failure may become possible ... As a highly available database solution, days of restoration should be avoided as much as possible


 Comments   
Comment by Ramon Fernandez Marina [ 12/Aug/15 ]

patrickwong@wisers.com, if I understand correctly, you're describing a scenario where all three config servers crash at the same time and are unable to start up again (for example, because of SERVER-18316, or some other hardware/software failure). This is a very extreme case, and as you say the possibility of this scenario should be minimal as any high-availability deployment would surely have a UPS system to prevent unclean shutdowns.

In this scenario having more config servers will not help if they're subject to the same failure mode as the others, e.g., because they're running on the same machine, or on the same rack and the rack loses power. Since the purpose of having three config servers is for redundancy, every sharded deployment should make sure that no more than two config servers may be affected by any single point of failure.

That being said, as part of SERVER-1448 config servers will become replica sets, so users will be able to add more nodes to their config server setup. Since I believe this will address the request you're making I'm going to close this ticket as a duplicate of SERVER-1448.

SERVER-1448 is scheduled for the current development cycle, and at the time of this writing we expect it to be part of the upcoming MongoDB 3.2, which should be available Q4 2015. Please watch SERVER-1448 for updates if you're interested in this feature.

Regards,
Ramón

Generated at Thu Feb 08 03:52:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.