[SERVER-4640] database corrupted Created: 06/Jan/12  Updated: 30/Mar/12  Resolved: 08/Jan/12

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.0.2
Fix Version/s: None

Type: Question Priority: Critical - P2
Reporter: jitendra Assignee: Unassigned
Resolution: Done Votes: 0
Labels: crash
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

linux debian


Participants:

 Description   

we run mongod with journal. at run time we unmount disk and change disk setting. two thing happened.
1. journal recovery take 15 min.
2. database corrupted then rum validate command.

please help me how to handle this problem in production.



 Comments   
Comment by Eliot Horowitz (Inactive) [ 16/Jan/12 ]

If it gets corrupted you should run --repair?
If you are running with journalling (default on 2.0.2, it should not get corrupted in the first place though)

Comment by jitendra [ 16/Jan/12 ]

we have limited hard disk, we are not using replica.
if corruption take place then how to handle.

Comment by Eliot Horowitz (Inactive) [ 10/Jan/12 ]

If you get corruption you should either run repair or just wipe that member of the replica set and let it resync.
Very hard to tell if/what happened without any logs.

Comment by jitendra [ 10/Jan/12 ]

once i have seen corruption on database.
it start to give error in insertion, i saw logs , it tell there is database corruption and u have to validate database.but we loss logs.
what should do when corruption take place.
1. we can run validate but it is too slow
2.if validate is false then run repair, it again slow command

can you please give recommendation when corruption tale place

Comment by Eliot Horowitz (Inactive) [ 09/Jan/12 ]

Why do you think there is corruption?

Comment by Eliot Horowitz (Inactive) [ 09/Jan/12 ]

Why do you think there is corruption?

Comment by Eliot Horowitz (Inactive) [ 09/Jan/12 ]

Why do you think there is corruption?

Comment by jitendra [ 09/Jan/12 ]

that's clear. one more think database corruption.
1. Why does database corruption take place.
2. If database corruption happens , then how to handle it.

Comment by Eliot Horowitz (Inactive) [ 08/Jan/12 ]

Given that it took 2 minutes to allocated a 1gb journal file - its clear the filesystem is an issue.
Journal recovery will be very fast unless the disk is very slow - so that's the only sensible explanation.

Comment by jitendra [ 08/Jan/12 ]

ok

cam you pls explain below logs recover take too much time because of file system or else.

Fri Jan 6 13:44:37 [initandlisten] recover skipping application of section seq:22582843 < lsn:42621633
Fri Jan 6 13:44:38 [initandlisten] recover skipping application of section more...
Fri Jan 6 13:54:46 [initandlisten] recover /u01/shard3/master/journal/j._3
Fri Jan 6 14:02:32 [initandlisten] recover cleaning up
Fri Jan 6 14:02:32 [initandlisten] removeJournalFiles
Fri Jan 6 14:02:32 [initandlisten] recover done

Comment by Eliot Horowitz (Inactive) [ 06/Jan/12 ]

Looks like the file system was very slow throughout that period.

2 minutes to allocate 1 gb journal file is very very slow, and only file system is being used.

Comment by jitendra [ 06/Jan/12 ]

hard disk on different physical location on san network . disk was faulty.
then recovery take time 16 minutes or server available after 22 min it is big problem.

          • SERVER RESTARTED *****

Fri Jan 6 13:44:31 [initandlisten] MongoDB starting : pid=4477 port=20000 dbpath=/u01/shard3/master/ 64-bit host=ctxos-node-01
Fri Jan 6 13:44:31 [initandlisten] db version v2.0.2, pdfile version 4.5
Fri Jan 6 13:44:31 [initandlisten] git version: 514b122d308928517f5841888ceaa4246a7f18e3
Fri Jan 6 13:44:31 [initandlisten] build info: Linux bs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41
Fri Jan 6 13:44:31 [initandlisten] options:

{ bind_ip: "192.168.50.168", dbpath: "/u01/shard3/master/", journalCommitInterval: 2, logappend: true, logpath: "/usr/local/ct/depend/mongo/logs/mongod_20000.log", port: 20000, quiet: true, shardsvr: true }

Fri Jan 6 13:44:32 [initandlisten] journal dir=/u01/shard3/master/journal
Fri Jan 6 13:44:32 [initandlisten] recover begin
Fri Jan 6 13:44:32 [initandlisten] recover lsn: 42621633
Fri Jan 6 13:44:32 [initandlisten] recover /u01/shard3/master/journal/j._2
Fri Jan 6 13:44:32 [initandlisten] recover skipping application of section seq:22106183 < lsn:42621633
Fri Jan 6 13:44:33 [initandlisten] recover skipping application of section seq:22165763 < lsn:42621633
Fri Jan 6 13:44:33 [initandlisten] recover skipping application of section seq:22225343 < lsn:42621633
Fri Jan 6 13:44:34 [initandlisten] recover skipping application of section seq:22284933 < lsn:42621633
Fri Jan 6 13:44:34 [initandlisten] recover skipping application of section seq:22344533 < lsn:42621633
Fri Jan 6 13:44:35 [initandlisten] recover skipping application of section seq:22404113 < lsn:42621633
Fri Jan 6 13:44:36 [initandlisten] recover skipping application of section seq:22463683 < lsn:42621633
Fri Jan 6 13:44:36 [initandlisten] recover skipping application of section seq:22523273 < lsn:42621633
Fri Jan 6 13:44:37 [initandlisten] recover skipping application of section seq:22582843 < lsn:42621633
Fri Jan 6 13:44:38 [initandlisten] recover skipping application of section more...
Fri Jan 6 13:54:46 [initandlisten] recover /u01/shard3/master/journal/j._3
Fri Jan 6 14:02:32 [initandlisten] recover cleaning up
Fri Jan 6 14:02:32 [initandlisten] removeJournalFiles
Fri Jan 6 14:02:32 [initandlisten] recover done
Fri Jan 6 14:02:55 [initandlisten] preallocateIsFaster=true 432.26
Fri Jan 6 14:02:56 [initandlisten] preallocateIsFaster=true 6.36
Fri Jan 6 14:03:21 [initandlisten] preallocateIsFaster=true 50.2
Fri Jan 6 14:03:21 [initandlisten] preallocateIsFaster check took 48.478 secs
Fri Jan 6 14:03:21 [initandlisten] preallocating a journal file /u01/shard3/master/journal/prealloc.0
136314880/1073741824 12%
178257920/1073741824 16%
209715200/1073741824 19%
251658240/1073741824 23%
283115520/1073741824 26%
314572800/1073741824 29%
377487360/1073741824 35%
398458880/1073741824 37%
429916160/1073741824 40%
461373440/1073741824 42%
482344960/1073741824 44%
513802240/1073741824 47%
545259520/1073741824 50%
587202560/1073741824 54%
639631360/1073741824 59%
681574400/1073741824 63%
713031680/1073741824 66%
723517440/1073741824 67%
765460480/1073741824 71%
817889280/1073741824 76%
849346560/1073741824 79%
870318080/1073741824 81%
922746880/1073741824 85% 996147200/1073741824 92%
1038090240/1073741824 96%
Fri Jan 6 14:05:12 [initandlisten] preallocating a journal file /u01/shard3/master/journal/prealloc.1
136314880/1073741824 12%
178257920/1073741824 16%
209715200/1073741824 19%
241172480/1073741824 22%
283115520/1073741824 26%
314572800/1073741824 29%
377487360/1073741824 35%
419430400/1073741824 39%
450887680/1073741824 41%
492830720/1073741824 45%
513802240/1073741824 47%
545259520/1073741824 50%
587202560/1073741824 54%
629145600/1073741824 58%
660602880/1073741824 61%
702545920/1073741824 65%
734003200/1073741824 68%
765460480/1073741824 71%
796917760/1073741824 74%
828375040/1073741824 77%
870318080/1073741824 81%
901775360/1073741824 83%
933232640/1073741824 86%
975175680/1073741824 90%
1017118720/1073741824 94%
1059061760/1073741824 98%
Fri Jan 6 14:06:49 [initandlisten] preallocating a journal file /u01/shard3/master/journal/prealloc.2
115343360/1073741824 10%
136314880/1073741824 12%
167772160/1073741824 15%
209715200/1073741824 19%
241172480/1073741824 22%
283115520/1073741824 26%
314572800/1073741824 29%
346030080/1073741824 32%
367001600/1073741824 34%
408944640/1073741824 38%
450887680/1073741824 41%
482344960/1073741824 44%
503316480/1073741824 46%
545259520/1073741824 50%
587202560/1073741824 54%
597688320/1073741824 55%
650117120/1073741824 60%
681574400/1073741824 63%
713031680/1073741824 66%
744488960/1073741824 69%
786432000/1073741824 73%
817889280/1073741824 76%
849346560/1073741824 79%
859832320/1073741824 80%
901775360/1073741824 83%
943718400/1073741824 87%
975175680/1073741824 90%
1006632960/1073741824 93%
1038090240/1073741824 96%
Fri Jan 6 14:08:37 [initandlisten] waiting for connections on port 20000
Fri Jan 6 14:08:37 [websvr] admin web console waiting for connections on port 21000

Comment by Eliot Horowitz (Inactive) [ 06/Jan/12 ]

How did you unmount?
Was it a single drive or a raid?
Can you send the log?

You shouldn't have been allowed to unmount with mongod running...

Generated at Thu Feb 08 03:06:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.