[SERVER-6012] Seg Fault, Invalid access at address, Logstream::get called in uninitialized state, on 64bit mongodb 2.0.6 Created: 05/Jun/12 Updated: 15/Aug/12 Resolved: 07/Aug/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code, Logging |
| Affects Version/s: | 2.0.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Alex Gaudio | Assignee: | Eric Milkie |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | crash | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
$ uname -a mongod --version mongod --version |
||
| Operating System: | Linux |
| Participants: |
| Description |
|
Hello. I'm hoping you can help me figure out why my mongod instance crashes when I execute a certain query. When I make the below query, I consistently kill the running mongod instance and receive same error message. I make the below query to a collection containing 170,941,610 documents. The collection has an index on the field, "transformed". Tue Jun 5 22:40:25 [conn933] query <DB>.stats_cumulative_stats query: { transformed: false }ntoreturn:500000 nscanned:58 255 nreturned:58254 reslen:4194308 607ms The log shows this error: Tue Jun 5 22:40:33 Invalid access at address: 0x3d27000 Tue Jun 5 22:40:33 Got signal: 11 (Segmentation fault). Tue Jun 5 22:40:33 Backtrace: Logstream::get called in uninitialized state — To recover the server, I follow these steps (my db is journaled): First, I start it up with /etc/init.d/mongodb start. At this point, I cannot log in via mongo client (ie mongo --host <HOST>) returns "connect failed". This is the error message: MongoDB shell version: 2.0.4 Then, I restart server via /etc/init.d/mongodb restart and I can log in as usual via the mongo client. |
| Comments |
| Comment by Eric Milkie [ 07/Aug/12 ] | |||
|
Closing as incomplete; we will watch for this crash to happen again. | |||
| Comment by Eric Milkie [ 26/Jun/12 ] | |||
|
Hi Alex, | |||
| Comment by Alex Gaudio [ 26/Jun/12 ] | |||
|
Hey Eric, Thanks for your response - I repaired the collection and am operating normally. At this point, I'm confused why the server crashed and corrupted my data when the log partition filled up, given that the db data and logs are in two separate partitions. I wouldn't be surprised if there's a bug or two that contributed to the problem. Also, a possible future feature: Given mongo's instability when it runs out of disk space mid-write, it may be helpful to add some sort of corruption protection or ability to shutdown the server instance when the server log approaches max disk usage. Thanks for your help. Alex | |||
| Comment by Eric Milkie [ 15/Jun/12 ] | |||
|
Let me see if I can surmise what happened here.
The move of the file does not affect the open file handle, so mongod keeps attempting to extend the file (but can't). At this point, it would seem that your datafiles are corrupt. The usual procedures for recovery apply here: restore from backup, or from another node in a replica set, or, failing that, running repair – which might lose any number of records and which will only fix structural corruption – data corruption could remain unless you have a way of checking for it yourself. | |||
| Comment by Alex Gaudio [ 11/Jun/12 ] | |||
|
>> Did you delete the log file while the server was still running? >> What do you mean specifically by "forked another mongod instance" Sorry, I mis-spoke a little. Here's basically a replay of what happened when I deleted the log file and managed to restart the server: $ df -h $ ps -ef|grep mongo $ sudo mv /var/logs/mongodb/ {mongodb.log,mongodb.log2} ; sudo rm -rf /var/logs/mongodb/mongodb.log2
>> Do you have a log from the server where you attempted to start it after the failure and yet couldn't connect? Don't have that log anymore, but if I remember correctly, I don't think the server logged anything (which suggests that perhaps the server log was still somehow "delete-pending"?). I just tried to regenerate the situation where I couldn't connect my mongo client to the server, but can't reproduce that error anymore. Maybe I can't reproduce because I removed indexes on this collection last Friday. That said, while my server isn't freezing up since deleting the indexes, I still do get this message about corrupt data from a mongo shell: ) │ 2;;bW9uZ28gLS1ob3N0IGVjMi0xMDctMjItNjEtODIuY29tcHV0ZS0xLmFtYXpvCk1> The corresponding log (similar to initial log I posted): << line about conn1 authentication>>> Mon Jun 11 22:20:48 [conn1] query <DB>.stats_cumulative_stats query: { parsed: false } exception: BSONElement: bad type -4 ---------------------- 1. partition filled up mid-write and corrupted data. | |||
| Comment by Eric Milkie [ 11/Jun/12 ] | |||
|
Hi Alex. | |||
| Comment by Alex Gaudio [ 11/Jun/12 ] | |||
|
Hi Eric, thanks for taking this one. I should also add that this might be due to an event that occurred a few days before I filed this bug: namely, our log partition filled up and locked the mongo server. After I deleted the log file, the mongod instance wouldn't shut down (ie "/$ etc/init.d/mongodb stop" would just hang). I didn't explicitly kill -9 the mongod instance, but instead forked another mongod instance and used /etc/init.d/mongodb stop to shut down both. I also didn't mention that we are not replicated or sharded. Hope that helps! Thanks, Alex |