[SERVER-3679] Point in Time Recovery Created: 23/Aug/11  Updated: 29/Feb/12  Resolved: 03/Sep/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Kenny Gorman Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

We would like point in time recovery for MongoDB. For instance, each time you recycle a journal, spool it to file first according to a parameter for it's destination. Allow for recovery to read from this destination if it exists, and the files are there. A scheme needs to be created to allow for recovery to be brought up to some point and stopped. There are analogs in MySQL, PostgreSQL, Oracle, etc.

http://dev.mysql.com/doc/refman/5.1/en/point-in-time-recovery.html

The enterprise use case is, say, a horrible piece of code was pushed to the live site. It deleted data or something, and all the slaves consumed the change. The lagged standby is 8 hours old, the problem is only 1 hour old. It would be a shame to have to logically rollback 7 hours of otherwise OK operations.

One could potentially manually do this operation today by opening the slave, essentially deleting oplog entries (recreate the oplog with the stuff you want), and then removing slaveDelay, but it's hacky/risky/tricky and could confuse the heck out of the replica set I suspect. Point in time recovery could potentially be based on slaveDelay if one allows for recovering up to a point, then become master. I could envision a series of replica set commands that allow for this. So thats a potentially different approach to get point in time recovery functionality.



 Comments   
Comment by Eliot Horowitz (Inactive) [ 03/Sep/11 ]

You can do this all today as Scott said.
There is room for a nice tool at some point...

Comment by Kenny Gorman [ 23/Aug/11 ]

The no-op approach is perfect.

In terms of mongodump, well, I suspect most folks backup at the file level vs mongodump/restore. So an analog would need to be there to allow for this to happen. Also, how would one keep oplog entries that would be over-written? Thus my point about spooling those files out to a destination.

Comment by Scott Hernandez (Inactive) [ 23/Aug/11 ]

You can turn any operation into a no-op (by just updating it); this will not remove it but you can alter docs which don't increase in size.

New versions of mongodump/restore may allow you to specify an oplog ts to replay to. https://jira.mongodb.org/browse/SERVER-3265

Comment by Kenny Gorman [ 23/Aug/11 ]

I suppose another approach would be to sneak the delayed slave up on the point of the failure but not over. Hard to do manually.

Perhaps an option to the replica set with a maximum TS to recovery to would work as well.

{maxRecovery:<ts>}

could be sent to rs.reconfig() and the oplog beyond that point in time would always be ignored.

Of course, if someone deleted an imporant collection the very best option would be to be able to issue a db.oplog.rs.remove(

{<the bad operation>}

). This can get tricky if subsequent operations touch that object and capped collections don't allow deletes. Perhaps add a cappedCollection to normalCollection conversion that could be run before this so at least there are manual ways to 'fix' your oplog stream.

Generated at Thu Feb 08 03:03:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.