-
Type: Question
-
Resolution: Done
-
Priority: Minor - P4
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Server Triage
I have a MongoDB Sharded cluster with a hybrid storage, i.e. some fast SSD and some slower and cheaper spinning rust.
For archiving I like to move some data to the slower disc. For legal reason we have to keep them, they are queried only occasionally.
In principle I would do it like this:
mongo --eval "sh.stopBalancer()" mongos-host:27017 # Repeat below on each shard host: mongo --eval "db.fsyncLock()" localhost:27018 cp /mongodb/data/collection/3109--6926861682361166404.wt /slow-disc/mongodb/collection/3109--6926861682361166404.wt ln --force --symbolic /mongodb/data/collection/3109--6926861682361166404.wt /slow-disc/mongodb/collection/3109--6926861682361166404.wt mongo --eval "db.fsyncUnlock()" localhost:27018 # After all shards are done: mongo --eval "sh.startBalancer()" mongos-host:27017
The indexes shall remain on the fast disc.
Would this be a reliable way to archive my data? What happens if the collection is read while move?
Another approach would be a file system like this:
/mongodb/data/collection /mongodb/data/index /mongodb/archive/collection -> /slow-disc/mongodb/collection /mongodb/archive/index
And then move the collection as this:
mongo --eval 'sh.shardCollection("archive.coll", shardKey)' mongos-host:27017 mongodump --uri "mongodb://mongos-host:27017" --db=data --collection=coll --archive=- | mongorestore --uri "mongodb://mongos-host:27017" --nsFrom="data.coll" --nsTo="archive.coll" --archive=- mongo --eval 'db.getSiblingDB("data").getCollection("coll").drop()' mongos-host:27017
Main disadvantage: the balancer has to distribute the whole data across the shards. It creates additional load on my shared cluster.
Which approach would you recommend?