[SERVER-9119] An issue with oplog and a slave in MongoDB 2.4.1 Created: 25/Mar/13 Updated: 10/Dec/14 Resolved: 27/Mar/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Logging, Replication |
| Affects Version/s: | 2.4.1 |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Daniel Karp | Assignee: | Richard Kreuter (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
A virtual machine running CentOS 6.3 |
||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
I'm creating this question at Ian Whalen's suggestion. It may be that some of this will become irrelevant if we decide to convert to replica sets, and I'm submitting it in part in case it might be of use to the MongoDB developers. Below is the text of the relevant part of my email to Ian: We are getting errors that indicate that our oplog has gone out of sync or something, the dates seem corrupted, and perhaps there is some issue with the slave. We haven't converted over to Replica set--we are using legacy master-slave replication. Our question is: what can we do to fix this situation (see some detailed info below)? Can we do a repair? do we have to restart oplog from scratch? Do we need to just bite the bullet and learn how to upgrade to Replica sets? Notice the "upgradeNeeded": true from the db.slaves.find() command. I don't understand that--they are both now on 2.4.1. Anyway, thank you so much for your time! configured oplog size: 44792.567773437506MB db.slaves.find() { "_id" : ObjectId("50f463aebc26d2d32eb20df1"), "host" : "72.233.16.182", "ns" : "local.oplog.$main", "syncedTo" : { "t" : 1363746775, "i" : 1 }} { "_id" : ObjectId("514ea0359cf7fda50eceafa5"), "config" : { "host" : "72.233.16.182:37402", "upgradeNeeded" : true }, "ns" : "local.oplog.$main", "syncedTo" : { "t" : 1363746785, "i" : 1 }} db.printSlaveReplicationInfo() source: submongo.geekdo.com syncedTo: Fri Jan 16 1970 12:49:06 GMT-0600 (CST) = 1362799568 secs ago (378555.44hrs) our oplog seems massively huge too 43 gb In our slave log, we see things like: Sun Mar 24 18:10:02.706 [replslave] repl: syncing from host:submongo.geekdo.com so it seems to think it is syncing, but we don't really know whether to trust it. |
| Comments |
| Comment by Daniel Karp [ 27/Mar/13 ] | ||
|
That's fine! We will look into getting switched over to Replica sets as well. | ||
| Comment by Richard Kreuter (Inactive) [ 27/Mar/13 ] | ||
|
@Daniel: since it seems this might be a duplicate of a separate issue, I'm resolving this one for now. Feel free to reopen if you have further questions. | ||
| Comment by Richard Kreuter (Inactive) [ 25/Mar/13 ] | ||
|
Hi Daniel, It looks as though there is a reporting error in the shell, which accounts for the timestamp-related weirdness in the output of db.printSlaveReplicationInfo() above. Have a look at One way to examine the contents of the oplog without using the shell would be to have a look at what mongoexport prints out, e.g.,
On my system just now, the first line of oplog contents looked l like this:
And the "t" field inside the timestamp is a number of seconds since 1970. As far as the size of the oplog goes, to a first-order approximation the local database's size should be approximately the same size as the oplog collection. How much space on disk do the local.* files take up on your master node? |