[SERVER-59748] Mongo Service started failed by WiredTiger Error Created: 02/Sep/21  Updated: 23/Sep/21  Resolved: 22/Sep/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.8
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: F C Assignee: Edwin Zhou
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File WiredTiger     File WiredTiger.lock     File WiredTiger.turtle     File WiredTiger.wt     File WiredTiger.wt.orig     File WiredTigerHS.wt     File _mdb_catalog.wt     HTML File _repair_incomplete     Text File mongd-repair.log     File mongod.cfg     File mongod.lock     File sizeStorer.wt     File storage.bson    
Operating System: ALL
Participants:

 Description   

Hi~

Since 2021-08-26, Mongo Service started failed and i noticed on 2021-09-01, And I don't know why it occurs.

I run Mongo-4.4.msi installer to install MongoDB Community Server on Windows 10 x64, it works very well until 2021-08-26.

Then i run mongod by manually, it started fail too.

I checked the log and find this, but i don't know why it occurs.

{"t":{"$date":"2021-09-02T23:52:52.699+08:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":22,"message":"[1630597972:698658][14340:140723883234640], txn-recover: __config_err, 19: Error parsing 'access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=8),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none),bloc\ufffd_allocation=best,block_compressor=,cache_resident=false,checksum=on,collator=,columns=,dictionary=0,encryption=(keyid=,name=),format=btree,huffman_key=,huffman_value=,id=119,ignore_in_memory_cache_size=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=16k,key_format=u,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=16k,leaf_value_max=0,log=(enabled=true),memory_page_image_max=0,memory_page_max=5MB,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=true,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,value_format=u,version=(major=1,minor=1),checkpoint=(WiredTigerCheckpoint.911=(addr=\"018083e4f21c45cf8381e4f4a0783d8481e41ddd6f65808080e3ba3fc0e3775fc0\",order=911,time=1629903471,size=7843840,newest_start_durable_ts=0,oldest_start_ts=0,newest_txn=0,newest_stop_durable_ts=0,newest_stop_ts=-1,newest_stop_txn=-11,prepare=0,write_gen=211836,run_write_gen=211836)),checkpoint_backup_info=,checkpoint_lsn=(644,256)' at offset 154: Unexpected character: Invalid argument"}}

I'm attaching the files and log, and I hope it helps.



 Comments   
Comment by F C [ 23/Sep/21 ]

Thanks for your help~

I'll protect my data throught these ways in the future.

Comment by Edwin Zhou [ 22/Sep/21 ]

Hi catfishlty1@gmail.com,

I attempted to repair the metadata files that you attached to this ticket. Unfortunately the repair attempt was not successful. To avoid a problem like this in the future, it is our strong recommendation to:

Best,
Edwin

Comment by F C [ 18/Sep/21 ]

Thanks for your reply~
My Env is MongoDB with Single Node in Windows10 on Hard Drive. No SSD and No RAID. Also no backup method. So I can't recover the data from ReplicaSet or Other Node or Backup data. I think I'll do something to protect the data in the future.
I suspect that it's occured by blue screen during Windows corruption.
I checked the files again, and try to correct the WiredTiger meta file back to normal, but it's still not working. I use Hex Editor to check the turtle or wt files, and found there're some Unicode instead of correct characters, so the parse method throw out the error.
I'll reply to you about the content you memtioned:
1. log file is override by mistake.
2. missing
3.1 Windows10 with NTFS
3.2 Local in Hard Drive
3.3 No Raid
3.4 HDD
4. No
5. It works well
6.1 Before the corruption I use 4.4, after that I upgrade it to 5.x to try to restart it again, but it's faild. Then I downgrade to 4.4, and run --repair it.
6.2 No hardware changes during the corruption.
6.3 no
6.4 no

The 'mongod-repair.log' shows the procedure of repairing with '--repair'. I hope it can help. And it's hard to provide other logs on Windows.

After all, I hope there's a way to salvage the data by wt tool and the exists data file.

I hope it can help~

Comment by Edwin Zhou [ 17/Sep/21 ]

Hi catfishlty1@gmail.com, this error message leads us to suspect some form of physical corruption. Please make a complete copy of the database's $dbpath directory to safeguard so that you can work off of the current $dbpath.

Our ability to determine the source of this corruption depends greatly on your ability to provide:

  1. The logs for the affected node, including before, leading up to, and after the first sign of corruption.
  2. As much of syslog and dmesg content leading up to the first sign of corruption as possible.
  3. A description of the underlying storage mechanism in use, including details like:
    1. What file system and/or volume management system is in use?
    2. Is data storage locally attached or network-attached?
    3. Are disks RAIDed and if so how?
    4. Are disks SSDs or HDDs?
  4. A description of your backup method, if any.
  5. A description of your disks have been recently checked for integrity?
  6. A history of the deployment, including:
    1. a timeline of version changes
    2. a timeline of hardware upgrade/downgrade cycles or configuration changes
    3. a timeline of disaster recovery or backup restoration activities
    4. a timeline of any manipulations of the underlying database files, including copies or moves, and information about whether mongod was running during each manipulation.

The ideal resolution is to perform a clean resync from an unaffected node.

You can also try mongod --repair using the latest version of MongoDB.

In the event that a --repair operation is unsuccessful, then please also provide:

  • The logs leading up to the first occurrence of any issue
  • The logs of the repair operation.
  • The logs of any attempt to start mongod after the repair operation completed.

Best,
Edwin

Generated at Thu Feb 08 05:48:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.