-
Type: Question
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.4.1
-
Component/s: WiredTiger
-
Labels:None
Hi,
I very need your help to identify the source of the problem. I'll try to describe an issue in details.
What we have:
1. Soyoustart (OVH) server
2. Ubuntu 16.04 x64
3. Kernel 4.4.52 compiled using OVH config (ftp://ftp.ovh.net/made-in-ovh/bzImage/4.4.52/config-4.4.52-mod-std-ipv6-64)
4. ZFS 0.6.5.6 (installed as kernel module using apt-get install zfs-dkms)
5. LXD 2.12
6. MongoDB 3.4.1 - 3.4.3 inside LXC container
7. 600+ GB database
I use the following command to restore collection:
/opt/mongodb/bin/mongorestore -h 10.10.10.10 --authenticationDatabase=admin -u admin -d test -c fs.files --drop /backup/mongodb/test/fs.files.bson
After that I run db.fs.files.validate(true) in mongo shell and get the following:
"ns" : "test.fs.files", "nIndexes" : 5, "keysPerIndex" : { "test.fs.files.$_id_" : 24757148, "test.fs.files.$metadata.obj_id_1_metadata.type_1" : 28164517, "test.fs.files.$metadata.export_1" : 30710, "test.fs.files.$md5_1_metadata.type_1" : 24757148, "test.fs.files.$filename_1_uploadDate_1" : 24757148 }, "indexDetails" : { "test.fs.files.$_id_" : { "valid" : false }, "test.fs.files.$filename_1_uploadDate_1" : { "valid" : false }, "test.fs.files.$md5_1_metadata.type_1" : { "valid" : false }, "test.fs.files.$metadata.export_1" : { "valid" : false }, "test.fs.files.$metadata.obj_id_1_metadata.type_1" : { "valid" : false } }, "valid" : false, "warnings" : [ ], "errors" : [ "[1491902585:700463][865:0x7fdfc204e700], file:test/collection/0-7802303945598224820.wt, WT_SESSION.verify: read checksum error for 12288B block at offset 3095171072: calculated block checksum of 2872646372 doesn't match expected checksum of 1247204578", "[1491902585:801700][865:0x7fdfc204e700], file:test/collection/0-7802303945598224820.wt, WT_SESSION.verify: checkpoint ranges never verified: 3", "[1491902585:801999][865:0x7fdfc204e700], file:test/collection/0-7802303945598224820.wt, WT_SESSION.verify: file ranges never verified: 3", "verify() returned WT_ERROR: non-specific WiredTiger error. This indicates structural damage. Not examining individual documents.", "One or more indexes contain invalid index entries." ], "advice" : "A corrupt namespace has been detected. See http://dochub.mongodb.org/core/data-recovery for recovery steps.", "ok" : 1
The problem is not connected with this collection only, because I started to get this kind of errors with another collections' files and with index files 4 days ago (log example is attached).
Hard drives are ok, their smart shows zero reallocated sectors and only 1010 power on hours. Also zpool scrub doesn't find any errors.
$ zpool status pool: z state: ONLINE scan: scrub repaired 0 in 2h46m with 0 errors on Sun Apr 9 03:10:16 2017 config: NAME STATE READ WRITE CKSUM z ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 sda3 ONLINE 0 0 0 sdb3 ONLINE 0 0 0
I don't think that this is a ZFS problem since the dump was created successfully and mongorestore doesn't print any errors when loading it.
What is the best way to find buggy component (kernel, zfs, lxd, mongodb) in this situation?