[SERVER-34743] How to recover from "WiredTiger.wt, WT_CURSOR.next: read checksum error" ? Created: 29/Apr/18 Updated: 27/Jul/18 Resolved: 01/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.4.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Daniel Froz Costa | Assignee: | Kelsey Schubert |
| Resolution: | Done | Votes: | 0 |
| Labels: | envns, rge, rpu, trcf, wtc | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
Hi, Today I faced the problem that many already reported. The server crashed and the database is corrupted. I can't bring it up to normal. I've tried to start mongod with --repair ... nothing... How can I recover from this? I can see from many posts that you guys generate the repaired files and then we can start mongod. But really, how can we recover from this problem wtihout opening a ticket or depending on someone to fix the checksum at the WiredTiger file? I am attaching the files for analysis... BTW, I utilise --directoryperdb at my mongod instance. – LOGS
|
| Comments |
| Comment by Kelsey Schubert [ 01/May/18 ] |
|
Hi daniel.froz, Unfortunately, this error indicates that there was corruption on the disk, most often caused by a faulty storage layer. In this situation, our best recommendation would be to resync the affected node or restore from a backup if possible. To prevent this type of problem in the future please take note of the following guidelines to help mitigate any issues related to unreliable storage layers or server failures.
Thank you, |
| Comment by Daniel Froz Costa [ 01/May/18 ] |
|
Hi Kelsey, Now it's failing with error 100. about to fork child process, waiting until server is ready for connections. forked process: 1252 ERROR: child process failed, exited with error number 100
From the logs I can see the following: 2018-05-01T10:45:18.111-0300 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=256M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0), 2018-05-01T10:45:19.994-0300 E STORAGE [initandlisten] WiredTiger error (-31802) [1525182319:990528][1252:0x7f933903ecc0], file:sizeStorer.wt, txn-recover: unable to read root page from file:sizeStorer.wt: WT_ERROR: non-specific WiredTiger error 2018-05-01T10:45:19.998-0300 E STORAGE [initandlisten] WiredTiger error (-31802) [1525182319:998130][1252:0x7f933903ecc0], file:sizeStorer.wt, txn-recover: operation apply failed during recovery: operation type 4 at LSN 111/18702464: WT_ERROR: non-specific WiredTiger error 2018-05-01T10:45:19.998-0300 E STORAGE [initandlisten] WiredTiger error (0) [1525182319:998213][1252:0x7f933903ecc0], file:sizeStorer.wt, txn-recover: WiredTiger is unable to read the recovery log. 2018-05-01T10:45:19.998-0300 E STORAGE [initandlisten] WiredTiger error (0) [1525182319:998226][1252:0x7f933903ecc0], file:sizeStorer.wt, txn-recover: This may be due to the log files being encrypted, being from an older version or due to corruption on disk 2018-05-01T10:45:19.998-0300 E STORAGE [initandlisten] WiredTiger error (0) [1525182319:998232][1252:0x7f933903ecc0], file:sizeStorer.wt, txn-recover: You should confirm that you have opened the database with the correct options including all encryption and compression options 2018-05-01T10:45:19.998-0300 E STORAGE [initandlisten] WiredTiger error (-31802) [1525182319:998249][1252:0x7f933903ecc0], file:sizeStorer.wt, txn-recover: Recovery failed: WT_ERROR: non-specific WiredTiger error 2018-05-01T10:45:20.018-0300 I - [initandlisten] Assertion: 28595:-31802: WT_ERROR: non-specific WiredTiger error src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp 267 2018-05-01T10:45:20.029-0300 I STORAGE [initandlisten] exception in initAndListen: 28595 -31802: WT_ERROR: non-specific WiredTiger error, terminating
I also tried to run with --repair option. No success. Cheers Daniel Froz |
| Comment by Kelsey Schubert [ 30/Apr/18 ] |
|
Hi daniel.froz, Thank you for report. I've attached a repair attempt, repair_attempt.tar.gz Thank you, |
| Comment by Daniel Froz Costa [ 30/Apr/18 ] |
|
Hi, I am having trouble to attach the file as requested. "File XXXX was not uploaded. An internal error has occurred. Please contact your administrator... I can see it's a text file... so I am copying the content below: – START HERE WiredTiger version string WiredTiger 2.9.2: (December 23, 2016) WiredTiger version major=2,minor=9,patch=2 access_pattern_hint=none,allocation_size=4KB,app_metadata=,block_allocation=best,block_compressor=,cache_resident=false,checkpoint=(WiredTigerCheckpoint.100331=(addr="01c00781e406748afbc01481e4a0963c8bc01581e4c85ddb83808080e30c4fc0e305efc0",order=100331,time=1525036325,size=401408,write_gen=536294)),checkpoint_lsn=(111,18702464),checksum=uncompressed,collator=,columns=,dictionary=0,encryption=(keyid=,name=),format=btree,huffman_key=,huffman_value=,id=0,ignore_in_memory_cache_size=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=S,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=0,log=(enabled=true),memory_page_max=5MB,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=75,value_format=S,version=(major=1,minor=1) – END HERE Many thanks for the promptly reply!! Really appreciated! Cheers Daniel Froz |
| Comment by Ramon Fernandez Marina [ 30/Apr/18 ] |
|
daniel.froz, if you upload the WiredTiger.turtle file we can attempt to repair the catalog files. Regards, |