[SERVER-16642] Error repairing database Created: 23/Dec/14  Updated: 08/Jan/15  Resolved: 08/Jan/15

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.4.9, 2.6.4
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: carl dong Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

Hi Team ,

Could you advise how can I track on the issue ?

I am doing an incremental backup on the database , but failed when fetching the oplogs , so I run the repair command and fails again.
below is the error message.

2014-12-23T13:53:44.007+0800 [FileAllocator] allocating new datafile /home/ISH/Data/_tmp_repairDatabase_0/local/local.41, filling with zeroes...
2014-12-23T13:53:44.023+0800 [FileAllocator] done allocating datafile /home/ISH/Data/_tmp_repairDatabase_0/local/local.41, size: 2047MB, took 0.016 secs
2014-12-23T13:53:44.024+0800 [FileAllocator] allocating new datafile /home/ISH/Data/_tmp_repairDatabase_0/local/local.42, filling with zeroes...
2014-12-23T13:53:44.059+0800 [FileAllocator] done allocating datafile /home/ISH/Data/_tmp_repairDatabase_0/local/local.42, size: 2047MB, took 0.034 secs
2014-12-23T13:53:44.108+0800 [initandlisten] warning Listener::getElapsedTimeMillis returning 0ms
2014-12-23T13:53:44.174+0800 [initandlisten] warning Listener::getElapsedTimeMillis returning 0ms
2014-12-23T13:53:44.245+0800 [initandlisten] Assertion: 10334:BSONObj size: 1634624622 (0x616E646E) is invalid. Size must be between 0 and 16793600(16MB) First element: e: ?type=109
2014-12-23T13:53:44.332+0800 [initandlisten] local.oplog.rs 0x11e6111 0x1187e49 0x116c9f6 0x116cf4c 0x77329b 0x8b6d28 0xe016bf 0x767d4e 0x76aa0c 0x76c62f 0x76cedb 0x76d475 0x76d699 0x31dc81ed1d 0x764329
mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11e6111]
mongod(_ZN5mongo10logContextEPKc+0x159) [0x1187e49]
mongod(_ZN5mongo11msgassertedEiPKc+0xe6) [0x116c9f6]
mongod() [0x116cf4c]
mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x41b) [0x77329b]
mongod(_ZN5mongo10Collection6docForERKNS_7DiskLocE+0x68) [0x8b6d28]
mongod(_ZN5mongo14repairDatabaseESsbb+0x24ef) [0xe016bf]
mongod(_ZN5mongo11doDBUpgradeERKSsPNS_14DataFileHeaderE+0x5e) [0x767d4e]
mongod() [0x76aa0c]
mongod(_ZN5mongo14_initAndListenEi+0x5df) [0x76c62f]
mongod(_ZN5mongo13initAndListenEi+0x1b) [0x76cedb]
mongod() [0x76d475]
mongod(main+0x9) [0x76d699]
/lib64/libc.so.6(__libc_start_main+0xfd) [0x31dc81ed1d]
mongod() [0x764329]
2014-12-23T13:53:44.341+0800 [initandlisten] cleaning up failed repair db: local path: /home/ISH/Data/_tmp_repairDatabase_0
2014-12-23T13:53:46.279+0800 [initandlisten] exception in initAndListen: 10334 BSONObj size: 1634624622 (0x616E646E) is invalid. Size must be between 0 and 16793600(16MB) First element: e: ?type=109, terminating
2014-12-23T13:53:46.279+0800 [initandlisten] dbexit:

Thanks in Advance

Carl Dong



 Comments   
Comment by Ramon Fernandez Marina [ 08/Jan/15 ]

Thanks for the update carl.dong@windfindtech.com, happy to hear that your replica set is working well again.

I forgot to ask whether you would have considered uploading your database files; in the absence of system logs pointing to storage issues, analyzing the database files may help determine how the BSON corruption happened. Note also that if this was a hardware issue and you didn't replace your storage the issue may appear again. If that happens feel free to re-open this ticket.

Regards,
Ramón.

Comment by carl dong [ 07/Jan/15 ]

I don't find any disk error meessage in my system log , so I remove all data files and sync from another node , now my DB works fine .

Thanks for your help .

Comment by Ramon Fernandez Marina [ 06/Jan/15 ]

carl.dong@windfindtech.com, something that may help would be to search the system logs for error messages pointing to disk issues. That way we can be sure we've found the true cause of the problem. Can you please take a look at your system logs and post any error messages you may find?

Comment by Ramon Fernandez Marina [ 05/Jan/15 ]

carl.dong@windfindtech.com, the error message is indicative of data corruption on disk, often caused by flaky storage. At this stage you may need to re-sync from a healthy node after making sure your storage is healthy.

Comment by carl dong [ 04/Jan/15 ]

Anyone advise on the issue ?

Generated at Thu Feb 08 03:41:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.