[SERVER-3283] [replica set sync] local.oplog.rs Assertion failure isOk() db/pdfile.h 259 Created: 17/Jun/11 Updated: 30/Mar/12 Resolved: 05/Oct/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 1.8.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Michael Conigliaro | Assignee: | Kristina Chodorow (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu on EC2 |
||
| Operating System: | Linux |
| Participants: |
| Description |
|
I started seeing this assertion error on one of the slaves in one of my replica sets. It seems to have caused replication to stop. Fri Jun 17 04:15:24 [conn1059] query admin.$cmd ntoreturn:1 command: { writebacklisten: ObjectId('4ded68a18601f9f32f826c44') } reslen:60 300009ms reslen:60 300007ms reslen:60 300009ms reslen:60 300009ms reslen:60 300010ms reslen:60 300007ms reslen:60 300008ms reslen:60 300009ms reslen:60 300008ms reslen:60 300008ms reslen:60 300008ms reslen:60 300008ms reslen:60 300008ms Restarting the slave has had no effect, as you can see: Fri Jun 17 15:26:24 [initandlisten] MongoDB starting : pid=27662 port=27018 dbpath=/var/lib/mongodb 64-bit |
| Comments |
| Comment by Kristina Chodorow (Inactive) [ 12/Sep/11 ] |
|
You could run a repair(), but repair() will remove any corrupted data. This means that if you have a "clean" server, it might have records that this server doesn't after a repair. Thus, it isn't going to be exactly a clean copy of your data post-repair: it'll be a clean copy with some documents missing. Also, keep in mind that whatever server you got this data directory from probably had corruption, too, so you should repair or resync from a known clean version on that server, too. |
| Comment by Matt Parlane [ 11/Sep/11 ] |
|
I've just been bitten by this exact bug, and it was (most probably) because of an unclean shutdown. I initially synced the slave with a copy of the data directory rather than using MongoDB's built-in sync, and there is a good chance that the server which was the origin of that data was at one point run without journalling. Anyway, is there anything I can do to resolve the situation now without resyncing? The replica is on another continent and it takes foreeevveeeerrr to resync. Also, let me know if you need any info to debug. My backtrace looks almost identical to the one above. |
| Comment by Kristina Chodorow (Inactive) [ 30/Jun/11 ] |
|
Yes, it will only be detected if something tries to use the corrupted section, so theoretically it could exist forever without being detected. |
| Comment by Michael Conigliaro [ 29/Jun/11 ] |
|
Well, once 1.8 came out, I enabled journaling everywhere. Could DB corruption really go that long without being detected? |
| Comment by Kristina Chodorow (Inactive) [ 24/Jun/11 ] |
|
But was this server ever running without journaling? |
| Comment by Michael Conigliaro [ 23/Jun/11 ] |
|
Yes, that's why I suspected something was wrong with the init script. I brought this up before on IRC, but nobody could reproduce it, and it's never really been a problem (mostly just an annoyance). And yes, I am running with journaling. |
| Comment by Kristina Chodorow (Inactive) [ 23/Jun/11 ] |
|
That sounds like it's shutting down uncleanly! (which shouldn't actually be a problem if you're running with journaling, but would be a problem if it used to run w/out journaling.) |
| Comment by Michael Conigliaro [ 23/Jun/11 ] |
|
Oh, not that I know of. Although for the record, I suspect there might be something wrong with the init script on Ubuntu. Often I will do a restart with the init script, check the logs, and see that MongoDB is recovering from the journal. I'm not sure what that's all about, or if it matters at all. |
| Comment by Kristina Chodorow (Inactive) [ 23/Jun/11 ] |
|
No, I mean "at any time in the history of the world," not necessarily right before. |
| Comment by Michael Conigliaro [ 23/Jun/11 ] |
|
No unclean shutdown that I know of. That log snippit I posted is actually where those messages started, so if there was a shutdown, I imagine we would see MongoDB startup messages in there. |
| Comment by Kristina Chodorow (Inactive) [ 21/Jun/11 ] |
|
It looks like corruption in the slave, so probably for the best. Did you have an unclean shutdown? |
| Comment by Michael Conigliaro [ 17/Jun/11 ] |
|
i ultimately just deleted all the data on the slave and forced a resync. |