Details
-
Question
-
Resolution: Done
-
Minor - P4
-
None
-
None
-
None
-
None
-
Server Triage
Description
I have a MongoDB Sharded cluster with a hybrid storage, i.e. some fast SSD and some slower and cheaper spinning rust.
For archiving I like to move some data to the slower disc. For legal reason we have to keep them, they are queried only occasionally.
In principle I would do it like this:
mongo --eval "sh.stopBalancer()" mongos-host:27017 |
|
|
# Repeat below on each shard host:
|
mongo --eval "db.fsyncLock()" localhost:27018 |
|
|
cp /mongodb/data/collection/3109--6926861682361166404.wt /slow-disc/mongodb/collection/3109--6926861682361166404.wt |
ln --force --symbolic /mongodb/data/collection/3109--6926861682361166404.wt /slow-disc/mongodb/collection/3109--6926861682361166404.wt |
|
|
mongo --eval "db.fsyncUnlock()" localhost:27018 |
|
|
# After all shards are done:
|
mongo --eval "sh.startBalancer()" mongos-host:27017 |
The indexes shall remain on the fast disc.
Would this be a reliable way to archive my data? What happens if the collection is read while move?
Another approach would be a file system like this:
/mongodb/data/collection
|
/mongodb/data/index
|
/mongodb/archive/collection -> /slow-disc/mongodb/collection
|
/mongodb/archive/index
|
And then move the collection as this:
mongo --eval 'sh.shardCollection("archive.coll", shardKey)' mongos-host:27017 |
mongodump --uri "mongodb://mongos-host:27017" --db=data --collection=coll --archive=- | mongorestore --uri "mongodb://mongos-host:27017" --nsFrom="data.coll" --nsTo="archive.coll" --archive=- |
mongo --eval 'db.getSiblingDB("data").getCollection("coll").drop()' mongos-host:27017 |
Main disadvantage: the balancer has to distribute the whole data across the shards. It creates additional load on my shared cluster.
Which approach would you recommend?