|
Hi Daniel5,
Yes, I would recommend taking another file copy. This will give mongodump --repair the best chance of success, however, please note that we cannot guarantee that this option will succeed, and, if it does suceed, some manual intervention may be required to removed duplicated documents.
Kind regards,
Kelsey
|
|
Hi Kelsey
Now the copy of the DB on my machine fails to start up. I guess it is related to the crash of the last repair process (see high up in this case) which crashed. I get the errors below in the log.
2017-09-27T09:42:24.734-0700 I CONTROL [main] Trying to start Windows service 'MongoDB'
2017-09-27T09:42:24.735-0700 I CONTROL [initandlisten] MongoDB starting : pid=8580 port=27017 dbpath=F:\mongo-dbs 64-bit host=Daniel2016
2017-09-27T09:42:24.735-0700 I CONTROL [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
2017-09-27T09:42:24.735-0700 I CONTROL [initandlisten] db version v3.4.7
2017-09-27T09:42:24.735-0700 I CONTROL [initandlisten] git version: cf38c1b8a0a8dca4a11737581beafef4fe120bcd
2017-09-27T09:42:24.735-0700 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1u-fips 22 Sep 2016
2017-09-27T09:42:24.735-0700 I CONTROL [initandlisten] allocator: tcmalloc
2017-09-27T09:42:24.735-0700 I CONTROL [initandlisten] modules: none
2017-09-27T09:42:24.735-0700 I CONTROL [initandlisten] build environment:
2017-09-27T09:42:24.736-0700 I CONTROL [initandlisten] distmod: 2008plus-ssl
2017-09-27T09:42:24.736-0700 I CONTROL [initandlisten] distarch: x86_64
2017-09-27T09:42:24.736-0700 I CONTROL [initandlisten] target_arch: x86_64
2017-09-27T09:42:24.736-0700 I CONTROL [initandlisten] options: { config: "D:\db-mongo\mongo-dev.conf", net:
Unknown macro: { port}
, service: true, setParameter:
Unknown macro: { cursorTimeoutMillis}
, storage:
Unknown macro: { dbPath}
, systemLog:
Unknown macro: { destination}
}
2017-09-27T09:42:24.736-0700 W - [initandlisten] Detected unclean shutdown - F:\mongo-dbs\mongod.lock is not empty.
2017-09-27T09:42:24.741-0700 I STORAGE [initandlisten] **************
old lock file: F:\mongo-dbs\mongod.lock. probably means unclean shutdown,
but there are no journal files to recover.
this is likely human error or filesystem corruption.
please make sure that your journal directory is mounted.
found 3 dbs.
see: http://dochub.mongodb.org/core/repair for more information
*************
2017-09-27T09:42:24.742-0700 I STORAGE [initandlisten] exception in initAndListen: 12596 old lock file, terminating
2017-09-27T09:42:24.742-0700 I NETWORK [serviceStopWorker] shutdown: going to close listening sockets...
2017-09-27T09:42:24.742-0700 I NETWORK [serviceStopWorker] shutdown: going to flush diaglog...
2017-09-27T09:42:24.742-0700 I CONTROL [serviceStopWorker] now exiting
I now cannot run mongodump --repair as the DB fails to start up.
I tried mongod --repair but it froze my machine (spec further up) after 15 minutes.
I can take another file copy of the DB from the server but this means shutting down production. As it takes quite some time to copy the 400GB I can schedule this for the weekend to minimise impact. Should I do that or is there another way?
Kind regards,
Daniel
|
|
Hi Daniel5,
Please note that mongod --repair and mongodump --repair are different operations that utilize different repair algorithms. From the terminal you should see something like:
$ mongodump --repair
|
2017-09-27T10:51:32.266-0400 writing repair of admin.system.indexes to
|
2017-09-27T10:51:32.266-0400 repair cursor found 1 document in admin.system.indexes
|
2017-09-27T10:51:32.266-0400 done dumping admin.system.indexes (0 documents)
|
2017-09-27T10:51:32.266-0400 writing repair of admin.system.version to
|
2017-09-27T10:51:32.267-0400 repair cursor found 1 document in admin.system.version
|
2017-09-27T10:51:32.267-0400 done dumping admin.system.version (0 documents)
|
2017-09-27T10:51:32.267-0400 writing repair of test.foo to
|
2017-09-27T10:51:32.267-0400 repair cursor found 1 document in test.foo
|
2017-09-27T10:51:32.267-0400 done dumping test.foo (0 documents)
|
Thanks,
Kelsey
|
|
Hi Kelsey
Not sure what you mean by output?
I have attached the log file as well as the mini dump. Please guide me what else you require.
Kind regards,
Daniel
|
|
Hi Daniel5,
Thank you for answering Mark's questions. Unfortunately, in cases like this it is very difficult to identify the root cause of the corruption. Would you please provide the output of mongodump --repair?
Kind regards,
Kelsey
|
|
Thank you Mark for you feedback and attention to the case.
Unfortunately this is the only DB we have. I had to delete the other replica set member a while ago and then thought this is no problem as I can just re-sync it. This is now unfortunately not possible due to the problem described on this case. I need a solution to get replication going again even if it means there is a bit of data loss. At present we are also stuck on v3.2 as upgrading the DB is not possible either.
Please find the answers to your questions below.
- What kind of underlying storage mechanism are you using? Are the storage devices attached locally or over the network? DP: Locally. Are the disks SSDs or HDDs? DP: 2TB Seagate SATA HDD. What kind of RAID and/or volume management system are you using? DP: No RAID, Linux Centos 7 ext4. The box is sometimes unresponsive in which case we switch it off completely and then reboot.
- Would you please check the integrity of your disks? DP: Done via badblocks, result is: Pass completed, 0 bad blocks found. (0/0/0 errors)
- Has the database always been running this version of MongoDB? If not please describe the upgrade/downgrade cycles the database has been through. DP: It started as a MongoDB 2.4 DB for a few years. It was then upgraded to 3.2 in 2016.
- Have you manipulated (copied or moved) the underlying database files? If so, was mongod running? DP: Yes, the DB was originally on another server. Before copying mongod had been stopped. The data was copied onto an external USB3 hard drive and then shipped 1000 km via courier services. This was before the upgrade from v2.4.
- Have you ever restored this instance from backups? DP: No. Never made one besides copying the entire DB. We have changed this recently where we now execute mongodump. Unfortunately this is not possible for this DB as described on this case.
- What method do you use to create backups? DP: Stop mongod and then copy all files. This is done max once a year if not only every two years. We rely on the replica sets being the backup.
- When was the underlying filesystem last checked and is it currently marked clean? DP: Never checked the filesystem. Let me know if you need something else.
I am very much looking forward hearing from you.
Daniel
|
|
Hello Daniel5,
Thank you for the report. Unfortunately, this error indicates that there was corruption on the disk. In this situation, my best recommendation would be to resync the affected node or restore from a backup if possible.
To get an understanding of how this may have happened, I'd like to request some information:
- What kind of underlying storage mechanism are you using? Are the storage devices attached locally or over the network? Are the disks SSDs or HDDs? What kind of RAID and/or volume management system are you using?
- Would you please check the integrity of your disks?
- Has the database always been running this version of MongoDB? If not please describe the upgrade/downgrade cycles the database has been through.
- Have you manipulated (copied or moved) the underlying database files? If so, was mongod running?
- Have you ever restored this instance from backups?
- What method do you use to create backups?
- When was the underlying filesystem last checked and is it currently marked clean?
Thanks,
Mark
|
Generated at Thu Feb 08 04:26:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.