[SERVER-14045] Cannot start mongodb Created: 25/May/14  Updated: 14/Nov/14  Resolved: 14/Nov/14

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.6.1
Fix Version/s: None

Type: Bug Priority: Blocker - P1
Reporter: Amin Cheloh Assignee: Ramon Fernandez Marina
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux node.xyz.com 3.11.0-20-generic #35~precise1-Ubuntu SMP Fri May 2 21:32:55 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux


Attachments: Text File mongo.log    
Issue Links:
Related
Operating System: Linux
Participants:

 Description   

After reboot server i cannot start mongodb server and i attach log file



 Comments   
Comment by Ramon Fernandez Marina [ 14/Nov/14 ]

nineamin, since we haven't heard back from you for some time we're going to close this ticket. If this is still an issue for you feel free to reopen the ticket and provide additional details about your setup to see if you're affected by SERVER-15369 or this is a completely different issue.

Regards,
Ramón.

Comment by Ramon Fernandez Marina [ 17/Oct/14 ]

nineamin, is this still an issue for you? If the answer is yes, have you checked SERVER-15369 to see if you're running a configuration that may be affected by that issue?

Comment by Ramon Fernandez Marina [ 07/Oct/14 ]

nineamin, are you running on VMWare by any chance? I'm asking because of SERVER-15369, where we've found a bug on Linux systems running on VMWare that affects MongoDB operation.

Comment by Tyler Brock [ 06/Jun/14 ]

I'm going to mark this as resolved for the time being, let me know if you want us to dig further or you have any more questions. Thanks.

Comment by Tyler Brock [ 04/Jun/14 ]

Amin thank you for sending us all of the files. Using a hex dump utility we were able to examine the "admin.ns" file which had several good entries and then what appears to be garbage written in the middle of the file. I've copied an series of excerpts from the hex dump below so you can see this for yourself:

... several valid ns entries
00044d70  00 00 00 00 14 37 2c 4b  61 64 6d 69 6e 2e 73 79  |.....7,Kadmin.sy|
00044d80  73 74 65 6d 2e 69 6e 64  65 78 65 73 00 00 00 00  |stem.indexes....|
00044d90  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
...
00094df0  00 00 00 00 00 00 00 00  00 00 00 00 34 c3 14 0e  |............4...|
00094e00  61 64 6d 69 6e 2e 73 79  73 74 65 6d 2e 6e 61 6d  |admin.system.nam|
00094e10  65 73 70 61 63 65 73 00  00 00 00 00 00 00 00 00  |espaces.........|
00094e20  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
...
00356250  00 00 00 00 00 00 00 00  00 00 00 00 88 c8 33 18  |..............3.|
00356260  61 64 6d 69 6e 2e 73 79  73 74 65 6d 2e 76 65 72  |admin.system.ver|
00356270  73 69 6f 6e 00 00 00 00  00 00 00 00 00 00 00 00  |sion............|
00356280  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
.... then garbage starting on a 4KB boundary:
00efa000  62 35 38 65 66 34 63 62  64 37 63 62 30 31 62 61  |b58ef4cbd7cb01ba|
00efa010  33 66 38 64 34 31 39 36  31 32 63 00 65 6e 00 36  |3f8d419612c.en.6|
00efa020  36 62 36 31 38 63 32 63  39 36 64 62 62 62 63 66  |6b618c2c96dbbbcf|
00efa030  66 65 66 62 61 66 64 64  62 66 38 34 38 66 64 00  |fefbafddbf848fd.|
00efa040  65 6e 00 62 33 33 36 66  38 33 65 33 30 38 33 30  |en.b336f83e30830|
00efa050  61 61 30 34 39 64 61 39  33 62 39 38 37 32 36 66  |aa049da93b98726f|
00efa060  65 64 65 00 65 6e 00 33  65 36 37 61 63 33 33 33  |ede.en.3e67ac333|
00efa070  66 33 61 61 66 36 33 37  32 61 39 65 32 63 65 39  |f3aaf6372a9e2ce9|
00efa080  64 33 65 32 61 32 66 00  65 6e 00 34 33 39 66 65  |d3e2a2f.en.439fe|
00efa090  39 65 62 62 30 30 64 64  38 33 63 35 61 63 37 36  |9ebb00dd83c5ac76|
00efa0a0  39 63 33 34 61 32 33 65  64 33 35 00 65 6e 00 32  |9c34a23ed35.en.2|
00efa0b0  62 30 35 65 31 63 36 39  37 31 65 31 61 30 35 34  |b05e1c6971e1a054|
00efa0c0  32 62 64 36 37 37 65 35  61 63 64 39 33 35 39 00  |2bd677e5acd9359.|
... more of the same, then changes to different garbage:
00efbc70  34 66 36 61 31 34 62 31  62 38 38 34 65 34 66 64  |4f6a14b1b884e4fd|
00efbc80  37 64 37 00 65 6e 00 31  31 31 37 63 61 35 33 61  |7d7.en.1117ca53a|
00efbc90  65 30 37 30 62 32 38 39  34 35 64 32 33 64 32 66  |e070b28945d23d2f|
00efbca0  35 61 34 66 65 33 37 00  00 00 00 00 00 00 00 00  |5a4fe37.........|
00efbcb0  00 00 00 00 00 00 00 00  b8 0c f0 01 bb 0c f0 01  |................|
00efbcc0  de a7 14 00 00 00 00 00  5f 24 03 00 68 e5 00 00  |........_$..h...|
00efbcd0  5c 85 06 00 00 00 00 00  a0 ec 5a 00 00 00 00 00  |\.........Z.....|
00efbce0  95 03 00 00 00 00 00 00  dc 0c f0 01 df 0c f0 01  |................|
00efbcf0  e0 a7 14 00 00 00 00 00  60 24 03 00 69 e5 00 00  |........`$..i...|
00efbd00  5c 85 06 00 00 00 00 00  36 f0 5a 00 00 00 00 00  |\.......6.Z.....|
00efbd10  9e 03 00 00 00 00 00 00  00 0d f0 01 03 0d f0 01  |................|
00efbd20  e2 a7 14 00 00 00 00 00  63 24 03 00 6a e5 00 00  |........c$..j...|
... more of the same
00efffb0  a8 aa 14 00 00 00 00 00  73 2d 03 00 cd e6 00 00  |........s-......|
00efffc0  5c 85 06 00 00 00 00 00  a3 a5 5d 00 00 00 00 00  |\.........].....|
00efffd0  ba 00 00 00 00 00 00 00  10 3f f0 01 13 3f f0 01  |.........?...?..|
00efffe0  aa aa 14 00 00 00 00 00  ae 37 03 00 ce e6 00 00  |.........7......|
00effff0  5c 85 06 00 00 00 00 00  5e a6 5d 00 00 00 00 00  |\.......^.].....|
00f00000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
... garbage seems to end on 4KB boundary, then one more valid ns entry
00f005e0  60 c7 06 2a 61 64 6d 69  6e 2e 73 79 73 74 65 6d  |`..*admin.system|
00f005f0  2e 75 73 65 72 73 00 00  00 00 00 00 00 00 00 00  |.users..........|
00f00600  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

This is likely to be a storage related issue as the damage starts on a 4kb boundary and the data doesn't look like anything mongod would have written to the namespace files. We can take a look at /var/log/messages, the syslog, and/or dmesg to look for msync and other storage layer related errors if you would like but it doesn't seem like this namespace file corruption was the result of code in mongod. Can you provide any other information about your virtualization setup?

Comment by Amin Cheloh [ 01/Jun/14 ]

I don't know.

No, It's impossible.

Thanks

Comment by Tyler Brock [ 30/May/14 ]

Amin, do you know why the log file shows many databases with other names such as "8111b8026c549b4c26b"?

Is it possible that the actual data directory containing the vorchat database is somewhere other than the current dbpath mongo is using ("/var/lib/mongodb/")?

Comment by Amin Cheloh [ 28/May/14 ]

No, my database just only "vorchat". I also upload full /var/lib/mongodb/ backup (except journal directory) via scp

Thanks

Comment by Tyler Brock [ 27/May/14 ]

Hi Amin,

Thanks for uploading the files. I see you uploaded a file called vorchat.zip. When I unzip that file I see a single database file (vorchat.0) and a namespace file (vorchat.ns). Is this related to the "8111b8026c549b4c26b" database involved in this ticket?

Comment by Amin Cheloh [ 25/May/14 ]

my workaround is delete admin.0 and admin.ns database files now i can start mongodb server and need recreate users.

Comment by Amin Cheloh [ 25/May/14 ]

the log is as attached file

Comment by Eliot Horowitz (Inactive) [ 25/May/14 ]

Can you take a backup of the whole directory, and then remove the mongod.lock file and then try starting mongod

Comment by Amin Cheloh [ 25/May/14 ]

database files zip and attached

additional information:
the server is vm running on vmware esxi 5.5 and i reboot vm because upgrade cpu core from 1 to 4 core.

Thanks

Comment by Eliot Horowitz (Inactive) [ 25/May/14 ]

Do you know what collections if any should be in that database?
Could you attache those database files?
Can attached in a private ticket or ssh: scp -P 722 SERVER-14045@www.mongodb.com: <then use any password>

Comment by Amin Cheloh [ 25/May/14 ]

yes clean reboot

result:
root@node:~# mongodump --dbpath /var/lib/mongodb/ -d 8111b8026c549b4c26b
**************
Error: journal files are present in journal directory, yet starting without journaling enabled.
It is recommended that you start with journaling enabled so that recovery may occur.
**************

If you are running a mongod on the same path you should connect to that instead of direct data file access

2014-05-25T10:36:28.140+0700 [tools] dbexit:
2014-05-25T10:36:28.140+0700 [tools] shutdown: going to close listening sockets...
2014-05-25T10:36:28.140+0700 [tools] shutdown: going to flush diaglog...
2014-05-25T10:36:28.140+0700 [tools] shutdown: going to close sockets...
2014-05-25T10:36:28.140+0700 [tools] shutdown: waiting for fs preallocator...
2014-05-25T10:36:28.140+0700 [tools] shutdown: closing all files...
2014-05-25T10:36:28.140+0700 [tools] closeAllFiles() finished
2014-05-25T10:36:28.140+0700 [tools] shutdown: removing fs lock...
2014-05-25T10:36:28.140+0700 [tools] dbexit: really exiting now

so i try to run with --journal option
result:
root@node:~# mongodump --dbpath /var/lib/mongodb/ -d 8111b8026c549b4c26b --journal
2014-05-25T10:36:52.988+0700 [tools] journal dir=/var/lib/mongodb/journal
2014-05-25T10:36:52.988+0700 [tools] recover begin
2014-05-25T10:36:52.988+0700 [tools] info no lsn file in journal/ directory
2014-05-25T10:36:52.988+0700 [tools] recover lsn: 0
2014-05-25T10:36:52.988+0700 [tools] recover /var/lib/mongodb/journal/j._0
2014-05-25T10:36:53.005+0700 [tools] recover cleaning up
2014-05-25T10:36:53.005+0700 [tools] removeJournalFiles
2014-05-25T10:36:53.281+0700 [tools] recover done
2014-05-25T10:36:53.281+0700 [tools] preallocating a journal file /var/lib/mongodb/journal/prealloc.0
2014-05-25T10:37:24.017+0700 DATABASE: 8111b8026c549b4c26b to dump/8111b8026c549b4c26b
2014-05-25T10:37:24.029+0700 [tools] dbexit:
2014-05-25T10:37:24.029+0700 [tools] shutdown: going to close listening sockets...
2014-05-25T10:37:24.029+0700 [tools] shutdown: going to flush diaglog...
2014-05-25T10:37:24.029+0700 [tools] shutdown: going to close sockets...
2014-05-25T10:37:24.029+0700 [tools] shutdown: waiting for fs preallocator...
2014-05-25T10:37:24.029+0700 [tools] shutdown: lock for final commit...
2014-05-25T10:37:24.029+0700 [tools] shutdown: final commit...
2014-05-25T10:37:24.029+0700 [tools] shutdown: closing all files...
2014-05-25T10:37:24.029+0700 [tools] closeAllFiles() finished
2014-05-25T10:37:24.029+0700 [tools] journalCleanup...
2014-05-25T10:37:24.029+0700 [tools] removeJournalFiles
2014-05-25T10:37:24.075+0700 [tools] shutdown: removing fs lock...
2014-05-25T10:37:24.075+0700 [tools] dbexit: really exiting now

and
root@node:~# tree dump/
dump/
└── 8111b8026c549b4c26b

1 directory, 0 files

Comment by Eliot Horowitz (Inactive) [ 25/May/14 ]

can you try running:
mongodump --dbpath /var/lib/mongodb/ -d 8111b8026c549b4c26b

Was it a clean reboot?

Generated at Thu Feb 08 03:33:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.