[DOCS-3847] Instructions for (MMS) Restoring a Replica Set from Backup don't actually work Created: 31/Jul/14  Updated: 12/Dec/18  Resolved: 02/Dec/14

Status: Closed
Project: Documentation
Component/s: Cloud Manager
Affects Version/s: None
Fix Version/s: v1.3.11, v1.3.15

Type: Bug Priority: Blocker - P1
Reporter: Victor Hooi Assignee: Bob Grabar
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OSX 10.9.4
MongoDB 2.6.3


Issue Links:
Depends
Related
Participants:
Days since reply: 5 years, 9 weeks, 1 day ago

 Description   

I have just tried to run through the instructions for restoring a replica set from a downloaded MMS backup (http://mms.mongodb.com/help/tutorial/restore-replica-set/).

I am running MongoDB 2.6.3 on OSX 10.9.4.

I have started a restore job on my MMS group, and then downloaded the tar.gz file via HTTPS.

I have created a new empty replica-set, then I follow step 1, "Shut down the entire replica set".

I then follow steps 2 through to 4. However, when I attempt to run the seedSecondary.sh script in step 4, this fails with:

./seedSecondary.sh 27017 1.8
MongoDB shell version: 2.6.3
connecting to: 127.0.0.1:27017/test
2014-07-31T13:15:07.883+1000 collection already exists

The script appears to be attempting to create the oplog.rs collection, which I believe already exists in my replicaset.

I increased my logging to logLevel2, and then ran that command again to confirm this:

2014-07-31T13:20:44.663+1000 [initandlisten] connection accepted from 127.0.0.1:62066 #6 (2 connections now open)
2014-07-31T13:20:44.663+1000 [conn6] run command admin.$cmd { whatsmyuri: 1 }
2014-07-31T13:20:44.664+1000 [conn6] command admin.$cmd command: whatsmyuri { whatsmyuri: 1 } ntoreturn:1 keyUpdates:0 numYields:0  reslen:62 0ms
2014-07-31T13:20:44.665+1000 [conn6] run command local.$cmd { create: "oplog.rs", capped: true, size: 1932735283.2 }
2014-07-31T13:20:44.665+1000 [conn6] create collection local.oplog.rs { capped: true, size: 1932735283.2 }
2014-07-31T13:20:44.665+1000 [conn6] command local.$cmd command: create { create: "oplog.rs", capped: true, size: 1932735283.2 } keyUpdates:0 numYields:0 locks(micros) w:163 reslen:75 0ms
2014-07-31T13:20:44.667+1000 [conn6] SocketException: remote: 127.0.0.1:62066 error: 9001 socket exception [CLOSED] server [127.0.0.1:62066]
2014-07-31T13:20:44.667+1000 [conn6] end connection 127.0.0.1:62066 (1 connection now open)

I then delete all of my local database files (local.0, local.1 and local.ns), and retry step 4:

./seedSecondary.sh 27017 1.8
MongoDB shell version: 2.6.3
connecting to: 127.0.0.1:27017/test
WriteResult({ "nInserted" : 1 })

Step 4 now appears to be successful (although I'm curious if there's a better way of working around this). I I continue with step 5 and step 6, however, I seem to hit an error at step 6 when it asks me to run rs.initiate():

mongo
MongoDB shell version: 2.6.3
connecting to: test
Server has startup warnings:
2014-07-31T13:30:57.659+1000 [initandlisten]
2014-07-31T13:30:57.660+1000 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
> rs.initiate()
{
        "ok" : 0,
        "errmsg" : "local.oplog.rs is not empty on the initiating member.  cannot initiate."
}

Would it be possible to fix the instructions in step 1 as well as step 6 please?



 Comments   
Comment by Githook User [ 12/Dec/18 ]

Author:

{'name': 'Anthony Sansone', 'email': 'tony.sansone@mongodb.com', 'username': 'atsansone'}

Message: (DOCS-3847): Add note on s390x support.
Branch: master
https://github.com/10gen/mms-docs/commit/37bb3a7b0963f83be5568ca9ad82e4c87ab6ee16

Comment by Githook User [ 02/Dec/14 ]

Author:

{u'username': u'bgrabar', u'name': u'Bob Grabar', u'email': u'bob.grabar@10gen.com'}

Message: DOCS-3847 restore a replica set
Branch: master
https://github.com/10gen/mms-docs/commit/c8a61d256f2883b9d4d0c626747f4831815dc371

Comment by Randy Tyler [X] [ 24/Oct/14 ]

I only used that method because I couldn't get the instructions from the manual to work. I got authorization errors running seedSecondary.sh. The instructions just weren't clear to me on how to complete the restore. Since I did copy the restore file to all members in my method, do think all I’m really missing is the seedSecondary.sh step?

Comment by Peter Garafano (Inactive) [ 24/Oct/14 ]

Hi Randy,

Jakov's method will certainly work, and for small datasets it is probably easier than the method we have described in our docs. However, this method will always require the secondary nodes to do a full resync, this means they will wipe their database files and perform an initial sync from the primary. For large data sets, the process we have described is crucial, an initial sync for several terabytes of data is very costly.

Following Jakov's steps, the secondary nodes will not be able to correlate their data with the data on the primary due to a lack of common OpLog entries, this will cause the secondaries to do a full resync regardless of whether or not you seeded them from the backup. The seedSecondary.sh script inserts an oplog entry into each member of the replica set, thus giving them a common point off of which they can begin syncing.

Since we provide a copy of data that can be used across all nodes in a replica set, why waste the time and burden the primary with the initial sync process? It is usually more efficient to spend some time up front using the seedSecondary.sh script to provide the required common OpLog entries and bring the entire replica set up at once.

I hope this makes a bit more sense now.

-Pete

Comment by Randy Tyler [X] [ 24/Oct/14 ]

I found Jakov's plan of attack easier to understand than the tutorial. I did one thing different which was to copy and unpack the mms restore files to all members of the replica set (in hopes that I won't need a resync). I did not run the seedSecondary.sh script that came with the restore files. The procedure below worked pretty well for me.

1) Generate the restore file from MMS

2) Stop all mongod instances in the replica set

$ mongod --shutdown --config /mongo/install/conf2/mongodb.conf.mgsha.pdh.27031
$ ssh vxpid-hmgsha02 ". ~/.mongo_env ; mongod --shutdown --config /mongo/install/conf2/mongodb.conf.mgsha.pdh.27031"
$ ssh vxpid-hmgsha03 ". ~/.mongo_env ; mongod --shutdown --config /mongo/install/conf2/mongodb.conf.mgsha.pdh.27031"

3) Carefully unpack the restore tar-ball on all nodes in the replica set

  1. Unpack the mms restore files tar-ball...
    $ cd /mongo/dumps
    $ tar -xvf mgsha.pdh.27031-1414099800-4893221731705857100.tar.gz
    $ scp mgsha.pdh.27031-1414099800-4893221731705857100.tar.gz vxpid-hmgsha02:/mongo/dumps
    $ scp mgsha.pdh.27031-1414099800-4893221731705857100.tar.gz vxpid-hmgsha03:/mongo/dumps
    $ ssh vxpid-hmgsha02 "cd /mongo/dumps ; tar -xvf mgsha.pdh.27031-1414099800-4893221731705857100.tar.gz"
    $ ssh vxpid-hmgsha03 "cd /mongo/dumps ; tar -xvf mgsha.pdh.27031-1414099800-4893221731705857100.tar.gz"
  1. Remove database data files, but not the sub-directories...
    $ rm /mongo/data01/DB//
    $ rm /mongo/data01/DB/*
    $ ssh vxpid-hmgsha02 "rm /mongo/data01/DB//"
    $ ssh vxpid-hmgsha02 "rm /mongo/data01/DB/*"
    $ ssh vxpid-hmgsha03 "rm /mongo/data01/DB//"
    $ ssh vxpid-hmgsha03 "rm /mongo/data01/DB/*"
  1. Move the restored database data files to their proper locations...
    $ cd /mongo/dumps/1414099800
    $ mv admin* /mongo/data01/DB/admin/
    $ mv mongoMED_db* /mongo/data01/DB/mongoMED_db/
    $ ssh vxpid-hmgsha02 "mv /mongo/dumps/1414099800/admin* /mongo/data01/DB/admin/"
    $ ssh vxpid-hmgsha02 "mv /mongo/dumps/1414099800/mongoMED_db* /mongo/data01/DB/mongoMED_db/"
    $ ssh vxpid-hmgsha03 "mv /mongo/dumps/1414099800/admin* /mongo/data01/DB/admin/"
    $ ssh vxpid-hmgsha03 "mv /mongo/dumps/1414099800/mongoMED_db* /mongo/data01/DB/mongoMED_db/"

4) Start each instance in the replica set normally

$ mongod --config /mongo/install/conf2/mongodb.conf.mgsha.pdh.27031
$ ssh vxpid-hmgsha02 ". ~/.mongo_env ; mongod --config /mongo/install/conf2/mongodb.conf.mgsha.pdh.27031"
$ ssh vxpid-hmgsha03 ". ~/.mongo_env ; mongod --config /mongo/install/conf2/mongodb.conf.mgsha.pdh.27031"

5) Recreate the replica set

> rs.initiate();
> rs.add(

{ _id: 1, host: "vxpid-hmgsha02:27031", priority: 1 }

) ;
> rs.add(

{ _id: 2, host: "vxpid-hmgsha03:27031", priority: 1 }

) ;
> rs.conf()
> rs.status();

Comment by Githook User [ 20/Oct/14 ]

Author:

{u'username': u'bgrabar', u'name': u'Bob Grabar', u'email': u'bob.grabar@10gen.com'}

Message: DOCS-3847 corrections to steps for restoring a replica set
Branch: v1.5
https://github.com/10gen/mms-docs/commit/3073ee4cf5855b53faf2ca6b976b4ddbd18e5966

Comment by Githook User [ 10/Sep/14 ]

Author:

{u'username': u'bgrabar', u'name': u'Bob Grabar', u'email': u'bob.grabar@10gen.com'}

Message: DOCS-3847 corrections to steps for restoring a replica set
Branch: master
https://github.com/10gen/mms-docs/commit/53a9ca966900bfae523bc01d560627b83dd9251d

Comment by Jakov Sosic [ 03/Sep/14 ]

I've fixed this problem by doing the restore in the following manner (I ignored online documentation which is obviously flawed):

  • download backup file
  • unpack backup file
  • stop all mongo replica nodes
  • empty all mongo data directories
  • start primary
  • run the following on primary:

    [root@primary] # mongo localhost:27017
    > rsconf={"_id": "my.replica.name", "members": [{"_id": 1, "host": "<current node hostname>:27017"}]}
    > rs.initiate(rsconf);

After that start mongo on secondary, and run following on primary:

[root@primary] # mongo localhost:27017
> rs.add("<secondary node>:27017")

Repeat as many times as the number of secondaries you have. But, note that this procedure will force all the secondaries to do full sync from zero.

Comment by Jakov Sosic [ 02/Sep/14 ]

I'm hitting the same issue. What did you do in the last step? How did you circumvent rs.initiate() error?

Generated at Thu Feb 08 07:46:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.