[SERVER-32523] file:WiredTiger.wt, connection: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error Created: 03/Jan/18  Updated: 14/Aug/18  Resolved: 17/Jan/18

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.4.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kevin Wong [X] Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: docker, envc, rns, rpu, szs, trcf, wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File WiredTiger.turtle     File WiredTiger.wt     File repair_attempt.tar.gz     File sizeStorer.wt    
Operating System: Linux
Participants:

 Description   

We run MongoDB 3.4.3 in an kubernetes cluster, the Mongo pod exsit and WiredTiger data files are corrupted. I copied data files to another volume and run Mongo repair command: mongod --repair, there’re errors like: WiredTiger error (-31802) [1514980856:210666][151:0x7fda0a758cc0], file:WiredTiger.wt, connection: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error

Could you please have a look at this? Thank you!

root@mongo2-rpg5s:/mnt/tmp# mongod --port 16000 --repair --dbpath /mnt/tmp/
2018-01-03T12:00:56.191+0000 I CONTROL  [initandlisten] MongoDB starting : pid=151 port=16000 dbpath=/mnt/tmp/ 64-bit host=mongo2-rpg5s
2018-01-03T12:00:56.191+0000 I CONTROL  [initandlisten] db version v3.4.3
2018-01-03T12:00:56.191+0000 I CONTROL  [initandlisten] git version: f07437fb5a6cca07c10bafa78365456eb1d6d5e1
2018-01-03T12:00:56.191+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1t  3 May 2016
2018-01-03T12:00:56.191+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2018-01-03T12:00:56.191+0000 I CONTROL  [initandlisten] modules: none
2018-01-03T12:00:56.191+0000 I CONTROL  [initandlisten] build environment:
2018-01-03T12:00:56.191+0000 I CONTROL  [initandlisten]     distmod: debian81
2018-01-03T12:00:56.191+0000 I CONTROL  [initandlisten]     distarch: x86_64
2018-01-03T12:00:56.191+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2018-01-03T12:00:56.191+0000 I CONTROL  [initandlisten] options: { net: { port: 16000 }, repair: true, storage: { dbPath: "/mnt/tmp/" } }
2018-01-03T12:00:56.191+0000 W -        [initandlisten] Detected unclean shutdown - /mnt/tmp/mongod.lock is not empty.
2018-01-03T12:00:56.196+0000 I -        [initandlisten] Detected data files in /mnt/tmp/ created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2018-01-03T12:00:56.196+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2018-01-03T12:00:56.196+0000 I STORAGE  [initandlisten] Detected WT journal files.  Running recovery from last checkpoint.
2018-01-03T12:00:56.196+0000 I STORAGE  [initandlisten] journal to nojournal transition config: create,cache_size=7511M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2018-01-03T12:00:56.210+0000 E STORAGE  [initandlisten] WiredTiger error (-31802) [1514980856:210666][151:0x7fda0a758cc0], file:WiredTiger.wt, connection: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error
2018-01-03T12:00:56.210+0000 E STORAGE  [initandlisten] WiredTiger error (0) [1514980856:210726][151:0x7fda0a758cc0], file:WiredTiger.wt, connection: WiredTiger has failed to open its metadata
2018-01-03T12:00:56.210+0000 E STORAGE  [initandlisten] WiredTiger error (0) [1514980856:210737][151:0x7fda0a758cc0], file:WiredTiger.wt, connection: This may be due tothe database files being encrypted, being from an older version or due to corruption on disk
2018-01-03T12:00:56.210+0000 E STORAGE  [initandlisten] WiredTiger error (0) [1514980856:210748][151:0x7fda0a758cc0], file:WiredTiger.wt, connection: You should confirmthat you have opened the database with the correct options including all encryption and compression options
2018-01-03T12:00:56.211+0000 I -        [initandlisten] Assertion: 28718:-31802: WT_ERROR: non-specific WiredTiger error src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp 251
2018-01-03T12:00:56.211+0000 I STORAGE  [initandlisten] exception in initAndListen: 28718 -31802: WT_ERROR: non-specific WiredTiger error, terminating
2018-01-03T12:00:56.211+0000 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
2018-01-03T12:00:56.211+0000 I NETWORK  [initandlisten] removing socket file: /tmp/mongodb-16000.sock
2018-01-03T12:00:56.211+0000 I NETWORK  [initandlisten] shutdown: going to flush diaglog...
2018-01-03T12:00:56.211+0000 I CONTROL  [initandlisten] now exiting
2018-01-03T12:00:56.211+0000 I CONTROL  [initandlisten] shutting down with code:100



 Comments   
Comment by Kelsey Schubert [ 24/Jan/18 ]

Unfortunately, that's correct. These files appear to have been corrupted at the operating system level and we do not have any tools that would enable recovery.

Kind regards,
Kelsey

Comment by Kevin Wong [X] [ 24/Jan/18 ]

Hi Kelsey T Schubert ,

Thank you very much!
We didn’t run on replication mode and has no backups, so there is no solution?
We’re trying these suggestions in key environments.

Kind regards,
Kevin

Comment by Kelsey Schubert [ 17/Jan/18 ]

Hi KevinWong,

Unfortunately, this error indicates that there was corruption on the disk, most often cause by a faulty storage layer. In this situation, our best recommendation would be to resync the affected node or restore from a backup if possible.

To prevent this type of problem in the future please take note of the following guidelines to help mitigate any issues related to unreliable storage layers or server failures.

Kind regards,
Kelsey

Comment by Kevin Wong [X] [ 15/Jan/18 ]

Hi Kelsey T Schubert,
Is there any progress? thanks.

Comment by Kevin Wong [X] [ 10/Jan/18 ]

Hi Kelsey,
Thanks for your help.
Unfortunately, we encountered a new problem in the repair process with the attachment you provided.The error message is as follows, the file sizeStorer.wt is included in the attachment. Please help to take a look.

Thank you,
Kevin

root@mongo2-rpg5s:/mnt/tmp# mongod --port 16000 --repair --dbpath /mnt/tmp/
2018-01-10T02:04:48.576+0000 I CONTROL [initandlisten] MongoDB starting : pid=267 port=16000 dbpath=/mnt/tmp/ 64-bit host=mongo2-rpg5s
2018-01-10T02:04:48.576+0000 I CONTROL [initandlisten] db version v3.4.3
2018-01-10T02:04:48.576+0000 I CONTROL [initandlisten] git version: f07437fb5a6cca07c10bafa78365456eb1d6d5e1
2018-01-10T02:04:48.576+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1t 3 May 2016
2018-01-10T02:04:48.576+0000 I CONTROL [initandlisten] allocator: tcmalloc
2018-01-10T02:04:48.576+0000 I CONTROL [initandlisten] modules: none
2018-01-10T02:04:48.576+0000 I CONTROL [initandlisten] build environment:
2018-01-10T02:04:48.576+0000 I CONTROL [initandlisten] distmod: debian81
2018-01-10T02:04:48.576+0000 I CONTROL [initandlisten] distarch: x86_64
2018-01-10T02:04:48.576+0000 I CONTROL [initandlisten] target_arch: x86_64
2018-01-10T02:04:48.576+0000 I CONTROL [initandlisten] options: { net:

{ port: 16000 }

, repair: true, storage:

{ dbPath: "/mnt/tmp/" }

}
2018-01-10T02:04:48.576+0000 W - [initandlisten] Detected unclean shutdown - /mnt/tmp/mongod.lock is not empty.
2018-01-10T02:04:48.580+0000 I - [initandlisten] Detected data files in /mnt/tmp/ created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2018-01-10T02:04:48.580+0000 W STORAGE [initandlisten] Recovering data from the last clean checkpoint.
2018-01-10T02:04:48.580+0000 I STORAGE [initandlisten] Detected WT journal files. Running recovery from last checkpoint.
2018-01-10T02:04:48.580+0000 I STORAGE [initandlisten] journal to nojournal transition config: create,cache_size=7511M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2018-01-10T02:04:48.742+0000 E STORAGE [initandlisten] WiredTiger error (-31802) [1515549888:742053][267:0x7f61063a8cc0], file:sizeStorer.wt, txn-recover: unable to read root page from file:sizeStorer.wt: WT_ERROR: non-specific WiredTiger error
2018-01-10T02:04:48.742+0000 E STORAGE [initandlisten] WiredTiger error (-31802) [1515549888:742140][267:0x7f61063a8cc0], file:sizeStorer.wt, txn-recover: operation apply failed during recovery: operation type 4 at LSN 7/2560: WT_ERROR: non-specific WiredTiger error
2018-01-10T02:04:48.742+0000 E STORAGE [initandlisten] WiredTiger error (-31802) [1515549888:742164][267:0x7f61063a8cc0], file:sizeStorer.wt, txn-recover: Recovery failed: WT_ERROR: non-specific WiredTiger error
2018-01-10T02:04:48.763+0000 I - [initandlisten] Assertion: 28718:-31802: WT_ERROR: non-specific WiredTiger error src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp 251
2018-01-10T02:04:48.854+0000 I STORAGE [initandlisten] exception in initAndListen: 28718 -31802: WT_ERROR: non-specific WiredTiger error, terminating
2018-01-10T02:04:48.854+0000 I NETWORK [initandlisten] shutdown: going to close listening sockets...
2018-01-10T02:04:48.854+0000 I NETWORK [initandlisten] removing socket file: /tmp/mongodb-16000.sock
2018-01-10T02:04:48.854+0000 I NETWORK [initandlisten] shutdown: going to flush diaglog...
2018-01-10T02:04:48.855+0000 I CONTROL [initandlisten] now exiting
2018-01-10T02:04:48.855+0000 I CONTROL [initandlisten] shutting down with code:100

sizeStorer.wt

Comment by Kelsey Schubert [ 09/Jan/18 ]

Hello KevinWong,

Thank you for your report. I've attached a repair attempt of the files you provided. Please extract these files and replace them in your $dbpath and let us know if it resolves the issue. If you are still seeing errors after replacing these files, please provide the complete logs from the affected node so that we can further investigate.

Thank you,
Kelsey

Generated at Thu Feb 08 04:30:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.