[SERVER-5673] Crash during repairDatabase can leave the server unable to start up Created: 20/Apr/12  Updated: 06/Dec/22  Resolved: 14/Sep/18

Status: Closed
Project: Core Server
Component/s: Admin, MMAPv1, Stability
Affects Version/s: 2.0.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: fabio perrella Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

uname -a
Linux 2.6.32-5-amd64 #1 SMP Mon Jan 16 16:22:28 UTC 2012 x86_64 GNU/Linux


Attachments: File mongodb.log.tar.gz     File mongodb.log.tar.gz    
Assigned Teams:
Storage Execution
Operating System: Linux
Participants:

 Description   

after doing a db.repair one shard restarted ok, but the other doesn't

the logs after running db.repairDatabase()
----- mongo log ----

Fri Apr 20 15:32:03 [conn26] command admin.$cmd command:

{ serverStatus: 1 }

ntoreturn:1 reslen:1337 477ms
13537900/28932981 46%
14795700/28932981 51%
16063300/28932981 55%
Fri Apr 20 15:32:33 [conn4] command admin.$cmd command:

{ serverStatus: 1 }

ntoreturn:1 reslen:1337 177ms

---------------------
----- syslog --------
Apr 20 15:32:42 saas-di0017 kernel: [3050891.810663] lowmem_reserve[]: 0 0 0 0
Apr 20 15:32:42 saas-di0017 kernel: [3050891.810667] Node 0 DMA: 2*4kB 1*8kB 1*16kB 1*32kB 2*64kB 2*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 1*4096kB = 7872kB
Apr 20 15:32:42 saas-di0017 kernel: [3050891.810680] Node 0 DMA32: 1006*4kB 0*8kB 0*16kB 2*32kB 2*64kB 2*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 8056kB
Apr 20 15:32:42 saas-di0017 kernel: [3050891.810691] 2510 total pagecache pages
Apr 20 15:32:42 saas-di0017 kernel: [3050891.810693] 674 pages in swap cache
Apr 20 15:32:42 saas-di0017 kernel: [3050891.810695] Swap cache stats: add 127526, delete 126852, find 43057/43736
Apr 20 15:32:42 saas-di0017 kernel: [3050891.810697] Free swap = 0kB
Apr 20 15:32:42 saas-di0017 kernel: [3050891.810699] Total swap = 471032kB
Apr 20 15:32:42 saas-di0017 kernel: [3050891.814451] 1048576 pages RAM
Apr 20 15:32:42 saas-di0017 kernel: [3050891.814451] 20309 pages reserved
Apr 20 15:32:42 saas-di0017 kernel: [3050891.814451] 3710 pages shared
Apr 20 15:32:42 saas-di0017 kernel: [3050891.814451] 1022293 pages non-shared
Apr 20 15:32:42 saas-di0017 kernel: [3050891.814451] Out of memory: kill process 26171 (mongod) score 158123 or a child
Apr 20 15:32:42 saas-di0017 kernel: [3050891.814451] Killed process 26171 (mongod)
---------------------

the logs after trying to run "mongod restart"
----- mongo log ----

          • SERVER RESTARTED *****

Fri Apr 20 19:28:42 [initandlisten] MongoDB starting : pid=10584 port=27018 dbpath=/var/lib/mongodb 64-bit host=saas-di0017
Fri Apr 20 19:28:42 [initandlisten] db version v2.0.4, pdfile version 4.5
Fri Apr 20 19:28:42 [initandlisten] git version: nogitversion
Fri Apr 20 19:28:42 [initandlisten] build info: Linux hm4317 2.6.32-5-amd64 #1 SMP Mon Oct 3 03:59:20 UTC 2011 x86_64 BOOST_LIB_VERSION=1_42
Fri Apr 20 19:28:42 [initandlisten] options:

{ config: "/etc/mongodb.conf", dbpath: "/var/lib/mongodb", logappend: "true", logpath: "/var/log/mongodb/mongodb.log", repair: true, rest: "true", shardsvr: "true" }

Fri Apr 20 19:28:42 [initandlisten] journal dir=/var/lib/mongodb/journal
Fri Apr 20 19:28:42 [initandlisten] recover begin
Fri Apr 20 19:28:42 [initandlisten] recover lsn: 329715219
Fri Apr 20 19:28:42 [initandlisten] recover /var/lib/mongodb/journal/j._13
Fri Apr 20 19:28:42 [initandlisten] recover skipping application of section seq:328530229 < lsn:329715219
Fri Apr 20 19:28:42 [initandlisten] recover skipping application of section seq:328589489 < lsn:329715219
Fri Apr 20 19:28:42 [initandlisten] recover skipping application of section seq:328648739 < lsn:329715219
Fri Apr 20 19:28:42 [initandlisten] recover skipping application of section seq:329004279 < lsn:329715219
Fri Apr 20 19:28:42 [initandlisten] recover skipping application of section seq:329063519 < lsn:329715219
Fri Apr 20 19:28:42 [initandlisten] recover skipping application of section seq:329122759 < lsn:329715219
Fri Apr 20 19:28:42 [initandlisten] recover /var/lib/mongodb/journal/j._14
Fri Apr 20 19:28:42 [initandlisten] recover skipping application of section seq:329596739 < lsn:329715219
Fri Apr 20 19:28:42 [initandlisten] recover skipping application of section seq:329655969 < lsn:329715219
Fri Apr 20 19:28:42 [initandlisten] exception during recovery
Fri Apr 20 19:28:42 [initandlisten] exception in initAndListen std::exception: boost::filesystem::file_size: No such file or directory: "/var/lib/mongodb/$tmp_repairDatabase_0/emailmarketing_development.11", terminating
Fri Apr 20 19:28:42 dbexit:
Fri Apr 20 19:28:42 [initandlisten] shutdown: going to close listening sockets...
Fri Apr 20 19:28:42 [initandlisten] shutdown: going to flush diaglog...
Fri Apr 20 19:28:42 [initandlisten] shutdown: going to close sockets...
Fri Apr 20 19:28:42 [initandlisten] shutdown: waiting for fs preallocator...
Fri Apr 20 19:28:42 [initandlisten] shutdown: lock for final commit...
Fri Apr 20 19:28:42 [initandlisten] shutdown: final commit...
Fri Apr 20 19:28:42 [initandlisten] shutdown: closing all files...
Fri Apr 20 19:28:42 [initandlisten] closeAllFiles() finished
Fri Apr 20 19:28:42 [initandlisten] shutdown: removing fs lock...
Fri Apr 20 19:28:42 dbexit: really exiting now



 Comments   
Comment by fabio perrella [ 23/Apr/12 ]

full log

Comment by fabio perrella [ 23/Apr/12 ]

I removed the journal log and it worked, thanks!
I'm attaching the complete log, you can search in Apr 20 14:00, it was the time I started the repairDb
One question, removing the journal directory, could I lost some data?

Thanks

Comment by Eliot Horowitz (Inactive) [ 22/Apr/12 ]

Looks the the repair itself failed.
Do you have the logs for that?

If the server won't start now, its safe to remove the journal directory since the repair only touches new files.

Generated at Thu Feb 08 03:09:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.