[SERVER-25747] Using applyOps in "local" db does not record oplog entries for other databases Created: 23/Aug/16  Updated: 19/Nov/16  Resolved: 19/Nov/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.2.7
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: deyukong Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

suppose we insert and delete the same record from a collection, we may got
two oplogs. named I and D in the natural order.
if we do db.runCommand(

{applyOps:[D, I]}

) (plz notice the order), then the record appears in the collection(use db.coll.find()), but no related-oplog inserted into the oplog.rs collection, thus primary and secondary unmatch.

Sprint: Repl 2016-10-10
Participants:

 Description   

applyOps in reverse-order makes oplog missing



 Comments   
Comment by Kelsey Schubert [ 19/Nov/16 ]

Hi wolf_kdy,

Thanks for your update and understanding. Accordingly, I'm going to close this ticket.

Kind regards,
Thomas

Comment by deyukong [ 19/Nov/16 ]

Hi @Thomas Schubert
Sincerely thanks for your reply.
It's ok and you may note it somewhere. In fact, this case is quite unusual so it is acceptable to me that you close this issue without doing anything.

As you said, ApplyOps is insufficient. it acquires the global lock to ensure the oplog has the same order beween Primary and Secondary. We are also wondering about abandon applyOps. To parse the inner structure of each oplog we got from primary and convert it into a CUD op may be a better idea and it is the strategy that mongod uses to process oplogs in initialSync and steadySync.

Comment by Kelsey Schubert [ 14/Nov/16 ]

Hi wolf_kdy,

Sorry for the delay getting back to you. We document that the local database is not replicated here (but don't say anything specific about commands): https://docs.mongodb.com/v3.2/reference/local-database/#overview

We can add documentation to ApplyOps to note replication behavior, but it already notes that "The applyOps command is primarily an internal command", and should not be used by clients, as the behavior is not documented and may change without warning or documentation. The ApplyOps command is insufficient to build replication, and there may be problems trying to use it to do so.

What other documentation would have helped you? I'd be happy to open a DOCS ticket on your behalf.

Thank you,
Thomas

Comment by deyukong [ 23/Aug/16 ]

I am one among a professional team which is aiming at building mongodb-cluster scaling among 1000+ machines.
We all think it inflexible to use mongos to do migrate.
After serval weeks' research, we think it better to use snapshot + oplog-delta to do migrate instead. We use applyOps to retrieve oplogs from source and migrate into destination.
local-database is not studiously selected. It is only an accident. We find that oplogs will sometimes lost but didnt find the relation with local-database until looking into your sourcecode.

Fixing it or not, it is your problem. I just think it my duty to report^_^. But at least comment it in your document if it wont be fixed

Comment by Scott Hernandez (Inactive) [ 23/Aug/16 ]

Thanks for info.

For your use-case, I was more interested in the why part, not reproducing this behavior; Why is applyOps being used, and why in the local database?

Since applyOps is an internal command, we don't have complete documentation on it, unfortunately. At the very least we can correct that.

Comment by deyukong [ 23/Aug/16 ]

@Scott Hernandez
1) use-case for using applyOps is trivial and you can easily reproduce. I'm glad to describe my case
a: prepare two indepedent mongod instances(replication-set is better)
b: create database named set in both mongod instances
c: type use test; db.test1.insert(

{"hello1":"world1"}

) in instance one.
d: type use test; db.test1.insert(

{"hello2":"world2"}

) in instance two.
e: type use local; db.oplog.rs.find().sort(

{"$natural":-1}

).limit(3) in instance two to find the insert oplog
f: type use local; db.runCommand(

{"applyOps":[ xxx(here the oplog retrieved in step e)]}

) in instance one
g: then you will amazingly find

{"hello2":"world2"}

inserted into db.test1 in instance one but no oplog can be found in oplog.rs
2) Yes, running applyOps from database other than 'local' will avoid thie issue. But problem is that when running from local, data described by the oplogs will be applied into user-collection but the oplog itself wont be appended into local.oplog.rs. Replications between primary and secondary will be inconsist which means some data will be lost
3) the reason is quite simple and I have figured out. in function _logOpRS it will return immediately when ns.startswith(local.)

Comment by Scott Hernandez (Inactive) [ 23/Aug/16 ]

We will look into creating a reproduction case and to understand the scope of the issue.

Can you describe your use-case for using applyOps, and specifically from the "local" database?

From your description it also sounds like you can avoid this issue by running the applyOps command from any database other than "local", is that correct?

Comment by deyukong [ 23/Aug/16 ]

edit:The reason upon is not correct, after one day's debuging & reviewing mongod-source-code
I finally grasp the point:
204 void _logOpRS(OperationContext* txn,
205 const char* opstr,
206 const char* ns,
207 const char* logNS,
208 const BSONObj& obj,
209 BSONObj* o2,
210 bool* bb,
211 bool fromMigrate) {
212 if (strncmp(ns, "local.", 6) == 0)

{ 213 return; 214 }

assume there are two databases:
1) the local database which stores oplog.rs
2) user-own database named test
if we switch to database local ( use command "use local") and types db.runCommand(

{'applyOps': [....]}

) to apply some oplogs to some collections inside databse test,
oplogs will successfully be applied to the correct table but nothing added into oplog.rs. line 212 causes all this.

Generated at Thu Feb 08 04:10:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.