[SERVER-8511] Live oplog can be dropped Created: 11/Feb/13  Updated: 05/Apr/16  Resolved: 10/Nov/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 2.8.0-rc0

Type: Bug Priority: Major - P3
Reporter: Derick Rethans Assignee: Eric Milkie
Resolution: Done Votes: 2
Labels: oplog
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: JPEG File IMAG1681.jpg    
Issue Links:
Duplicate
is duplicated by SERVER-14222 Replica set members should disallow d... Closed
Related
related to SERVER-11383 Prevent copydb to/from local database Closed
is related to SERVER-13483 segmentation fault after dropping oplog Closed
Operating System: ALL
Steps To Reproduce:

var rt = new ReplSetTest(

{nodes:2, useHostName:false, oplogSize:2}

)
rt.startSet()
rt.initiate()
var mFoo = rt.getMaster().getDB("test").foo;
var sFoo = rt.getSecondary().getDB("test").foo;
var sOplog = rt.getSecondary().getDB("local").oplog.rs
var mOplog = rt.getPrimary().getDB("local").oplog.rs
var counts = function () { return {m:mFoo.count(), s:sFoo.count() }}
mFoo.insert({})
mFoo.insert({});mFoo.getDB().getLastError(2);
printjson(sOplog.find().sort({$natural:-1}).limit(-1).next())
printjson(sOplog.drop())
printjson(counts())
mFoo.insert({});mFoo.getDB().getLastError();
printjson(counts())
sOplog.getDB().createCollection("oplog.rs",

{capped:true, size:500000}

)
mFoo.insert({});mFoo.getDB().getLastError(2);
var newEntry = sOplog.find().sort({$natural:-1}).limit(-1).next()
printjson(newEntry)
printjson(counts())

printjson(sOplog.drop())
mFoo.insert({});mFoo.getDB().getLastError();
printjson(counts())
mOplog.getDB().createCollection("oplog.rs",

{capped:true, size:500000}

)
mFoo.insert({});mFoo.getDB().getLastError();
printjson(counts())
rt.stopSet()

Participants:

 Description   

During the training, scotthernandez managed to drop the oplog on a slaveDelay secondary that was currently active with:

db.getSiblingDB("local").oplog.rs.drop()

db.getSiblingDB("local").oplog.rs.stats()

Now returns:

{ "ok" : 0, "errmsg" : "ns not found" }



 Comments   
Comment by Githook User [ 11/Nov/14 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-8511 move oplog-dropping test to noPassthrough suite
Branch: master
https://github.com/mongodb/mongo/commit/93aaab7e2ca64e330959c403b36d50fdb609e86e

Comment by Githook User [ 11/Nov/14 ]

Author:

{u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}

Message: SERVER-8511 Disable oplog dropping test in query_oplogreplay.js
Branch: master
https://github.com/mongodb/mongo/commit/92d967e8cd4d32f3b3348c89320f238df47a1454

Comment by Githook User [ 10/Nov/14 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-8511 prohibit local db and oplog collection from being dropped while replication is active
Branch: master
https://github.com/mongodb/mongo/commit/e765b6a3544b325b52c270c3743827fa963b0d10

Comment by Andy Schwerin [ 06/Mar/14 ]

andreas.nilsson@10gen.com, I think so. The builtin roles allowing write on the local database do allow dropping collections in that database, but "drop collection" has its own action type, so you could use UDR to prevent writers to local from dropping collections there.

Comment by Eric Milkie [ 06/Mar/14 ]

We may end up moving the oplog to a custom, optimized replication solution and out of the general database structure, which would make protecting it easier.

Comment by Andreas Nilsson [ 06/Mar/14 ]

spencer, schwerin do we have a specific action type for this?

More specifically can we consider UDR solving this ticket from Eliot's perspective of being handled by the access control system?

Comment by Eliot Horowitz (Inactive) [ 26/Nov/13 ]

I think this should be handled by the security system.
Some users might be able to do such things, but regular users not.

Comment by Eric Milkie [ 05/Mar/13 ]

I guess my point is there are a lot of other system related tables, even in the local database, where similar bad things might happen if you try to drop them on purpose. I don't think it's a bug that if you try to break your server on purpose, it breaks.

Comment by Scott Hernandez (Inactive) [ 05/Mar/13 ]

If you drop the primary oplog then all secondaries go into immediate "too stale" mode and all replication fails requiring a full resync, or promotion with no chance of rollback. All kinds of things break operationally and we cannot ensure normal system behavior.

Yes, there are malicious as well as accidental reasons this could be done. As it is logically a "system" collection used internally it should not be user droppable while being used. It is unexpected that it is not recreated automatically when an op is logged in the oplog (which is gone). I would have expected the system to gracefully recover (creating a new oplog), error out (since the write cannot be replicated, nor saved for replication) or shutdown (due to being in an invalid state – a replica without an oplog).

I'm not sure what refocusing is needed. It is a bug that it can be dropped, no matter what the reason.

Comment by Eric Milkie [ 05/Mar/13 ]

Is the concern that this will happen by accident? I think that's unlikely.
However, there is a concern that users with write access to the local database can drop important collections maliciously. Should this ticket be refocused on that aspect?

Generated at Thu Feb 08 03:17:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.