[SERVER-49319] WiredTiger crash after shutdown Created: 05/Jul/20  Updated: 28/Jul/20  Resolved: 28/Jul/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Anton Zolotukhin Assignee: Dmitry Agranat
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File WiredTiger     File WiredTiger.turtle     File WiredTiger.wt     PNG File image-2020-07-10-14-43-27-233.png     PNG File image-2020-07-10-14-45-44-647.png     PNG File image-2020-07-10-14-46-57-804.png     Text File repair-1.log    
Issue Links:
Related
related to WT-6771 The log should clarify what the figur... Closed
Operating System: ALL
Participants:

 Description   

Hello! I have issue with starting mongodb 4 after sudder shutdown of server.

I have have only one large singe node mongodb without backup.
I'm not sure if it correct to post it here

 Appreciate any help.

mongod --repair

log:

 

2020-07-05T14:55:04.120+0000 E STORAGE [initandlisten] WiredTiger error (0) [1593960904:120370][1:0x7f9d11fc6a80], file:collection-5-7105085878558723654.wt, WT_CURSOR.insert: __wt_bm_corrupt_dump, 135: {24331131842560, 188416, 969740655}: (chunk 184 of 184): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Raw: [1593960904:120370][1:0x7f9d11fc6a80], file:collection-5-7105085878558723654.wt, WT_CURSOR.insert: __wt_bm_corrupt_dump, 135: {24331131842560, 188416, 969740655}: (chunk 184 of 184): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
 2020-07-05T14:55:04.120+0000 E STORAGE [initandlisten] WiredTiger error (-31802) [1593960904:120454][1:0x7f9d11fc6a80], file:collection-5-7105085878558723654.wt, WT_CURSOR.insert: __wt_block_read_off, 281: collection-5-7105085878558723654.wt: fatal read error: WT_ERROR: non-specific WiredTiger error Raw: [1593960904:120454][1:0x7f9d11fc6a80], file:collection-5-7105085878558723654.wt, WT_CURSOR.insert: __wt_block_read_off, 281: collection-5-7105085878558723654.wt: fatal read error: WT_ERROR: non-specific WiredTiger error
 2020-07-05T14:55:04.120+0000 E STORAGE [initandlisten] WiredTiger error (-31804) [1593960904:120480][1:0x7f9d11fc6a80], file:collection-5-7105085878558723654.wt, WT_CURSOR.insert: __wt_panic, 494: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1593960904:120480][1:0x7f9d11fc6a80], file:collection-5-7105085878558723654.wt, WT_CURSOR.insert: __wt_panic, 494: the process must exit and restart: WT_PANIC: WiredTiger library panic
 2020-07-05T14:55:04.120+0000 E STORAGE [initandlisten] WiredTiger error (-31804) [1593960904:120504][1:0x7f9d11fc6a80], file:collection-5-7105085878558723654.wt, txn-recover: __txn_op_apply, 287: operation apply failed during recovery: operation type 4 at LSN 268058/40238976: WT_PANIC: WiredTiger library panic Raw: [1593960904:120504][1:0x7f9d11fc6a80], file:collection-5-7105085878558723654.wt, txn-recover: __txn_op_apply, 287: operation apply failed during recovery: operation type 4 at LSN 268058/40238976: WT_PANIC: WiredTiger library panic
 2020-07-05T14:55:04.120+0000 E STORAGE [initandlisten] WiredTiger error (-31804) [1593960904:120551][1:0x7f9d11fc6a80], file:collection-5-7105085878558723654.wt, txn-recover: __wt_txn_recover, 706: Recovery failed: WT_PANIC: WiredTiger library panic Raw: [1593960904:120551][1:0x7f9d11fc6a80], file:collection-5-7105085878558723654.wt, txn-recover: __wt_txn_recover, 706: Recovery failed: WT_PANIC: WiredTiger library panic
 2020-07-05T14:55:04.121+0000 E STORAGE [initandlisten] WiredTiger error (0) [1593960904:121027][1:0x7f9d11fc6a80], connection: __wt_cache_destroy, 350: cache server: exiting with 18 pages in memory and 9 pages evicted Raw: [1593960904:121027][1:0x7f9d11fc6a80], connection: __wt_cache_destroy, 350: cache server: exiting with 18 pages in memory and 9 pages evicted
 2020-07-05T14:55:04.121+0000 E STORAGE [initandlisten] WiredTiger error (0) [1593960904:121066][1:0x7f9d11fc6a80], connection: __wt_cache_destroy, 355: cache server: exiting with 79216 image bytes in memory Raw: [1593960904:121066][1:0x7f9d11fc6a80], connection: __wt_cache_destroy, 355: cache server: exiting with 79216 image bytes in memory
 2020-07-05T14:55:04.121+0000 E STORAGE [initandlisten] WiredTiger error (0) [1593960904:121085][1:0x7f9d11fc6a80], connection: __wt_cache_destroy, 358: cache server: exiting with 131776 bytes in memory Raw: [1593960904:121085][1:0x7f9d11fc6a80], connection: __wt_cache_destroy, 358: cache server: exiting with 131776 bytes in memory
 2020-07-05T14:55:04.121+0000 E STORAGE [initandlisten] WiredTiger error (0) [1593960904:121100][1:0x7f9d11fc6a80], connection: __wt_cache_destroy, 364: cache server: exiting with 83777 bytes dirty and 4 pages dirty Raw: [1593960904:121100][1:0x7f9d11fc6a80], connection: __wt_cache_destroy, 364: cache server: exiting with 83777 bytes dirty and 4 pages dirty
 2020-07-05T14:55:04.122+0000 F STORAGE [initandlisten] Failed to salvage WiredTiger metadata: -31809: WT_TRY_SALVAGE: database corruption detected
 2020-07-05T14:55:04.122+0000 F - [initandlisten] Fatal Assertion 50947 at src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp 722
 2020-07-05T14:55:04.122+0000 F - [initandlisten]
***aborting after fassert() failure
 

 



 Comments   
Comment by Dmitry Agranat [ 13/Jul/20 ]

Hi ginn.tar.gz@gmail.com,

I believe these questions will be a good addition to SERVER-40662 as it aims to provide better means to track repair progress. Was --repair process completed or is it still stuck? If it was not completed, could you upload mongod logs for us to investigate?

Regards,
Dima

Comment by Anton Zolotukhin [ 10/Jul/20 ]

After underlying RAID array fixed some errors, 'mongod --repair' succedded to start. It was oom-killed several times with different wiredtiger cache size. Tried to use 2GB, 3GB, 7.5GB and default value. when reached 16gb RAM capacity. So we raised available RAM to 40GB and swap up to 54Gb (wiredtiger cache was not set explicitly). 

After that salvage process performed well until consumed again all free memory and a bit stuck  

Resources consumption now is   

Anyway, I have some questions:

  1. What does numbers after WT_SESSION.salvage mean. Considering this issue https://jira.mongodb.org/browse/SERVER-40662, these are amount of pages, but how can I esimate total progress? 
  2. How can I estimate RAM amount needed for repair process given that I know approximate amount of documents in a collection, size of .wt file and maybe average document size?
  3. Maybe I should consider to try latest version of wiredtiger separately from mongodb? Or just upgrade mongodb to the latest version?

MongoDB version now is v4.0.19.

Appreciate any help, thanks in advance

 

Comment by Anton Zolotukhin [ 05/Jul/20 ]

MongoDB version is v4.0.18.
collection-5-7105085878558723654.wt is a 22tb collection of binary data chunks. 
Considering data loss, is it possible to extract any of these chunks?

Generated at Thu Feb 08 05:19:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.