Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-55866

Is it possible to move WiredTiger files to different file system?

    • Type: Icon: Question Question
    • Resolution: Done
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Server Triage

      I have a MongoDB Sharded cluster with a hybrid storage, i.e. some fast SSD and some slower and cheaper spinning rust.

      For archiving I like to move some data to the slower disc. For legal reason we have to keep them, they are queried only occasionally.

      In principle I would do it like this:

      mongo --eval "sh.stopBalancer()" mongos-host:27017
      
      # Repeat below on each shard host:
      mongo --eval "db.fsyncLock()" localhost:27018
      
      cp /mongodb/data/collection/3109--6926861682361166404.wt /slow-disc/mongodb/collection/3109--6926861682361166404.wt
      ln --force --symbolic /mongodb/data/collection/3109--6926861682361166404.wt /slow-disc/mongodb/collection/3109--6926861682361166404.wt
      
      mongo --eval "db.fsyncUnlock()" localhost:27018
      
      # After all shards are done:
      mongo --eval "sh.startBalancer()" mongos-host:27017
      

      The indexes shall remain on the fast disc.

      Would this be a reliable way to archive my data? What happens if the collection is read while move?

       

      Another approach would be a file system like this:

      /mongodb/data/collection
      /mongodb/data/index
      /mongodb/archive/collection -> /slow-disc/mongodb/collection 
      /mongodb/archive/index
      

      And then move the collection as this:

      mongo --eval 'sh.shardCollection("archive.coll", shardKey)' mongos-host:27017
      mongodump --uri "mongodb://mongos-host:27017" --db=data --collection=coll --archive=- | mongorestore --uri "mongodb://mongos-host:27017" --nsFrom="data.coll" --nsTo="archive.coll" --archive=-
      mongo --eval 'db.getSiblingDB("data").getCollection("coll").drop()' mongos-host:27017
      

      Main disadvantage: the balancer has to distribute the whole data across the shards. It creates additional load on my shared cluster.

      Which approach would you recommend?

       

       

       

       

       

            Assignee:
            backlog-server-triage [HELP ONLY] Backlog - Triage Team
            Reporter:
            wernfried.domscheit@sunrise.net Wernfried Domscheit
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: