[SERVER-9119] An issue with oplog and a slave in MongoDB 2.4.1 Created: 25/Mar/13  Updated: 10/Dec/14  Resolved: 27/Mar/13

Status: Closed
Project: Core Server
Component/s: Logging, Replication
Affects Version/s: 2.4.1
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Daniel Karp Assignee: Richard Kreuter (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

A virtual machine running CentOS 6.3


Issue Links:
Duplicate
duplicates SERVER-9085 db.replSetInfo() issue with convertin... Closed
Participants:

 Description   

I'm creating this question at Ian Whalen's suggestion. It may be that some of this will become irrelevant if we decide to convert to replica sets, and I'm submitting it in part in case it might be of use to the MongoDB developers.

Below is the text of the relevant part of my email to Ian:

We are getting errors that indicate that our oplog has gone out of sync or something, the dates seem corrupted, and perhaps there is some issue with the slave. We haven't converted over to Replica set--we are using legacy master-slave replication.

Our question is: what can we do to fix this situation (see some detailed info below)? Can we do a repair? do we have to restart oplog from scratch? Do we need to just bite the bullet and learn how to upgrade to Replica sets? Notice the "upgradeNeeded": true from the db.slaves.find() command. I don't understand that--they are both now on 2.4.1.

Anyway, thank you so much for your time!

configured oplog size: 44792.567773437506MB
log length start to end: 5613.861999999965secs (1.56hrs)
oplog first event time: Fri Jan 16 1970 11:15:32 GMT-0600 (CST)
oplog last event time: Fri Jan 16 1970 12:49:06 GMT-0600 (CST)
now: Sun Mar 24 2013 18:04:52 GMT-0500 (CDT)

db.slaves.find()

{ "_id" : ObjectId("50f463aebc26d2d32eb20df1"), "host" : "72.233.16.182", "ns" : "local.oplog.$main", "syncedTo" :

{ "t" : 1363746775, "i" : 1 }

}

{ "_id" : ObjectId("514ea0359cf7fda50eceafa5"), "config" :

{ "host" : "72.233.16.182:37402", "upgradeNeeded" : true }

, "ns" : "local.oplog.$main", "syncedTo" :

{ "t" : 1363746785, "i" : 1 }

}

db.printSlaveReplicationInfo()

source: submongo.geekdo.com

syncedTo: Fri Jan 16 1970 12:49:06 GMT-0600 (CST)

= 1362799568 secs ago (378555.44hrs)

our oplog seems massively huge too

43 gb

In our slave log, we see things like:

Sun Mar 24 18:10:02.706 [replslave] repl: syncing from host:submongo.geekdo.com

so it seems to think it is syncing, but we don't really know whether to trust it.



 Comments   
Comment by Daniel Karp [ 27/Mar/13 ]

That's fine! We will look into getting switched over to Replica sets as well.

Comment by Richard Kreuter (Inactive) [ 27/Mar/13 ]

@Daniel: since it seems this might be a duplicate of a separate issue, I'm resolving this one for now. Feel free to reopen if you have further questions.

Comment by Richard Kreuter (Inactive) [ 25/Mar/13 ]

Hi Daniel,

It looks as though there is a reporting error in the shell, which accounts for the timestamp-related weirdness in the output of db.printSlaveReplicationInfo() above. Have a look at SERVER-9085 for a bit more info and a link to the commit that should fix this error.

One way to examine the contents of the oplog without using the shell would be to have a look at what mongoexport prints out, e.g.,

mongoexport -d local -c 'oplog.$main' | head

On my system just now, the first line of oplog contents looked l like this:

{ "ts" : { "$timestamp" : { "t" : 1364233256, "i" : 1 } }, "op" : "n", "ns" : "", "o" : {} }

And the "t" field inside the timestamp is a number of seconds since 1970.

As far as the size of the oplog goes, to a first-order approximation the local database's size should be approximately the same size as the oplog collection. How much space on disk do the local.* files take up on your master node?

Generated at Thu Feb 08 03:19:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.