[SERVER-1998] Support cluster-wide backup option(s) Created: 25/Oct/10  Updated: 24/Jun/15  Resolved: 24/Jun/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Scott Hernandez (Inactive) Assignee: Unassigned
Resolution: Won't Fix Votes: 4
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-4306 Reference script for backing up a sha... Closed
Participants:

 Description   

Create some option to do a consistent backup of config + shards at the same time. Preferably this would be initiated from a single point (mongos while no balancing is going on) and would hit specified replica nodes in each shard/config replicaset/MS (if they are RS/MS) to reduce the load on the live system.

In a multi-node cluster it is important to take a consistent snapshot in time that can be restored.

Oh, yeah, and there needs to be a restore options as well.



 Comments   
Comment by Ian Whalen (Inactive) [ 24/Jun/15 ]

Hi everyone, thanks for your input and apologies for the delay in responding to this feature request.

I'm closing this as Won't Fix since it is not a functionality that we intend to build into the database itself, and instead recommend using the Cloud Manager Backup or Ops Manager to maintain cluster-wide backups.

Comment by Andrew Armstrong [ 03/Mar/11 ]

Was just reading http://groups.google.com/group/mongodb-user/browse_thread/thread/3388b378125e9978 and it reminded me of this ticket.

The problem as I understand it now is that taking a backup of Shard #1 and restoring its backup an hour later (if it were to crash) would not be good enough:
a) Chunks originally on this shard (and in the backup) may not exist on the shard anymore (balancer moved them to another shard)
b) Chunks allocated to this shard were never in the previous backup, but were since moved to this shard. Their data is however available in a different shards backup (but that shard no longer owns these chunks)

Thinking out loud about this; what about a backup system where:
a) You ask the mongo cluster to perform a backup system wide. Its an operation that happens for the entire cluster.
b) Each shard writes its local backup to a specific directory (or uploads to S3, whatever) independently in parallel

Assuming then after this backup is taken; Shard #1 catches on fire and you need to restore Shard #1 after buying new servers:
a) You ask the mongo cluster to restore Shard #1
b) Mongod Shard #1 primary (for example) asks the config servers where Shard #1's backup data should be, based on the latest cluster backup available
c) Config servers tell Shard #1 primary:

  • You need to download /backup/shard001/ data files (majority of your shard's data is here)
  • You need to also then grab a few chunks from /backup/shard002/ because between the time the last backup was taken, and when Shard #1 caught fire, you had some more chunks allocated to you that aren't in your original Shard #1 backup (but are available in shard002's backup files - the original chunk owner)
  • You need to ignore restoring chunks XXX from your backup files because they were since given a new owner on a different shard, and so you don't need to try restoring them as you don't own them anymore

Basically; let the mongo cluster be backup-aware and know how to restore data even if chunks have since moved around.

You just need to make sure enough backup space is available.

Thoughts?

  • Andrew
Comment by Dwight Merriman [ 10/Feb/11 ]

in a system that's been up a while, having the balancer off even for a full 24 hours might be ok. with the balancer off backups of the whole cluster would be ok then?

Comment by Andrew Armstrong [ 06/Feb/11 ]

Perhaps a command to initiate a backup of the cluster/shards would be better?

A single admin command could be issued to the cluster to perform a backup.

This would tell each shard master (or perhaps an up to date slave) to perform a local server backup and report where that backup is stored on disk locally back to the requesting client (or update a cluster wide status document) on how its going / when its finished.

You could then perhaps have the server operator automate uploading the backups offsite or to Amazon S3 via SSH to each server etc.

Comment by Eliot Horowitz (Inactive) [ 25/Oct/10 ]

Most people use sharding where there data gets large - so I disagree that not being able to backup to a single machine will be common.

we should doc the correct procedure in the wiki

Comment by Scott Hernandez (Inactive) [ 25/Oct/10 ]

Yep, I had thought about all those issues, but I suspect many people will not have that much data. I would guess for every 1 cluster that is too big to backup there will at least 20 that can.

If nothing else this should be documented on the wiki/docs.

Maybe the easiest option would be to use a (non-production) mongos instance where the backup runs with a slaveOk option. That would require mongodump to run with slaveOk on the query which sounds like another issue to create. Also, wouldn't parts of the config db need to skipped? Cause you would want to setup the cluster before restoring the data + indexes.

Comment by Eliot Horowitz (Inactive) [ 25/Oct/10 ]

Also - mongodump through mongos will have every object even if migrates are happening live.

Comment by Eliot Horowitz (Inactive) [ 25/Oct/10 ]

The reason we don't support a general option like this is that in many sharded systems, its actually impossible to do so since the data can't fit on a single host.

Currently you can turn off the balancer, and then backup the config and each shard independently.

Generated at Thu Feb 08 02:58:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.