[SERVER-19796] Cannot stop and backup hidden secondary without risk of inconsistent data. Created: 06/Aug/15  Updated: 06/Aug/15  Resolved: 06/Aug/15

Status: Closed
Project: Core Server
Component/s: Admin, Replication
Affects Version/s: 2.4.14
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dave Muysson Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 12.04


Operating System: Linux
Steps To Reproduce:

Using a replicated MongoD environment with one Primary, on Secondary, and one Hidden Secondary:
(backup process)

  • Issue "db.adminCommand( {shutdown : 1}

    )" to the Hidden Secondary

  • Clone the data under the database folder to an alternate location
  • Start the Hidden Secondary and allow it to catch up.

(restore process)
1. Stop all MongoD hosts under the replicaset
2. Remove the database data on all three MongoD hosts
3. On the Hidden Secondary, restore the backup data to its original location
4. Start the Hidden Secondary MongoD process
5. Force a new config via rs.reconfig() which only includes the hidden secondary
6. Add the original Primary and Secondary to the replicaset to replicate back
7. Reconfigure the replicaset back to Primary, Secondary, Hidden Secondary.

This will work 95% of the time. The other 5%, you cannot get past step 4 of the restore because the data that was backed up was still behind. The shutdown command did not wait for the hidden secondary to be caught up to the primary before it shut down.

Participants:

 Description   

Our offsite backup process operates by stopping the MongoD processes on our Hidden Secondary hosts, then capturing their data by cloning their database folder and copying it offsite.

The restore process works by stopping the MongoD process, removing the existing data within the database folder, then cloning the offsite data back into the database folder.

On several occasions now we have had the restored MongoD process never get out of 'startup2' as it believes there is data still to be replicated.

We have followed the documented recommendations for stopping a replica host by issuing "db.adminCommand(

{shutdown : 1}

)" to the MongoD process instead of stopping the process through upstart.

The documentation for the shutdown command states that it will not run unless "a" secondary has caught up with the primary. If the wording matches the logic, I suspect that the shutdown command proceeds with stopping the Hidden Secondary MongoD host, even when it is behind, because "A" Secondary is up to date (i.e. the non-hidden secondary witihin the replicaset).



 Comments   
Comment by Ramon Fernandez Marina [ 06/Aug/15 ]

dave.muysson@360pi.com, the documentation states the following about the shutdown command:

If the node you’re trying to shut down is a replica set primary, then the command will succeed only if there exists a secondary node whose oplog data is within 10 seconds of the primary.

When the command is run on a secondary there's no wait period until that secondary is caught up. To follow the backup procedure described above you may need to wait until the hidden secondary is caught up. There are also other backup procedures you may want to consider in case they better fit your needs.

Please note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server, and unfortunately we're not able to provide support here. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group. See also our Technical Support page for additional support resources.

Regards,
Ramón.

Generated at Thu Feb 08 03:52:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.