[SERVER-43991] encountered an illegal file format or internal value: 0x0: Created: 14/Oct/19  Updated: 27/Oct/23  Resolved: 08/Dec/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.0.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: krzysztof osmulski Assignee: Danny Hatcher (Inactive)
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive mongod.illegal_file_format.zip    
Operating System: ALL
Sprint: Storage Engines 2019-11-04
Participants:
Story Points: 3

 Description   

server did not crash not restart. actual up time is 45 days.

Mongo process did not restart from that time

got crash:

2019-10-14T12:50:13.572+0200 I NETWORK [conn47816] received client metadata from 127.0.0.1:52150 conn47816: { driver: {
 name: "mongo-java-driver", version: "3.9.1" }, os: { type: "Linux", name: "Linux", architecture: "amd64", version: "4.4
.0-148-generic" }, platform: "Java/Oracle Corporation/1.8.0_201-b09" }
2019-10-14T12:52:46.143+0200 E STORAGE [conn47814] WiredTiger error (22) [1571050366:142348][25057:0x7f6756b9f700], fil
e:index-8-5793834522867476723.wt, WT_CURSOR.search: __cell_data_ref, 626: encountered an illegal file format or internal
 value: 0x0: Invalid argument Raw: [1571050366:142348][25057:0x7f6756b9f700], file:index-8-5793834522867476723.wt, WT_CU
RSOR.search: __cell_data_ref, 626: encountered an illegal file format or internal value: 0x0: Invalid argument
2019-10-14T12:52:46.143+0200 E STORAGE [conn47814] WiredTiger error (-31804) [1571050366:143726][25057:0x7f6756b9f700],
 file:index-8-5793834522867476723.wt, WT_CURSOR.search: __wt_panic, 520: the process must exit and restart: WT_PANIC: Wi
redTiger library panic Raw: [1571050366:143726][25057:0x7f6756b9f700], file:index-8-5793834522867476723.wt, WT_CURSOR.se
arch: __wt_panic, 520: the process must exit and restart: WT_PANIC: WiredTiger library panic
2019-10-14T12:52:46.143+0200 F - [conn47814] Fatal Assertion 50853 at src/mongo/db/storage/wiredtiger/wiredtiger_
util.cpp 409
2019-10-14T12:52:46.144+0200 F - [conn47814]
 
***aborting after fassert() failure
 
 
2019-10-14T12:52:46.222+0200 F - [conn47814] Got signal: 6 (Aborted).



 Comments   
Comment by Brian Lane [ 08/Dec/19 ]

Thanks clydzik@wp.pl,

Feel free to reopen or create a new issue if you experience any other issues.

-Brian

Comment by krzysztof osmulski [ 07/Dec/19 ]

I can confirm that there is a problem with cheap ssd.
Did not tried other storage yet.
We can close the issue.

Comment by Danny Hatcher (Inactive) [ 06/Dec/19 ]

Have you experienced issues even after switching disks?

Comment by Danny Hatcher (Inactive) [ 29/Oct/19 ]

At this point in time we do believe the issue has to do with data corruption at the disk level. I believe the best path forward is to try another disk. If you still experience issues on a new disk, we can look further.

This case is a good example of the value of Replication. If you have three different servers containing copies of your data, one disk failing is not a problem as you still have two copies of your data you can sync from.

Comment by krzysztof osmulski [ 26/Oct/19 ]

This is just one server standalone installation.
I experienced issue with mongo in the past on this installation
For this problem from mongo --repair to occurrence server had no down or restarts.

But since I experienced more issues I can believe this may be a storage issue what is difficult to confirm.
Now this is pretty cheap SSD drive 500gb. It reports good health in S.M.A.R.T. it is pretty new also.
Also there is MySQL instance hosting Wikipedia mirror on same storage I did not experienced any issues with it. But it is mostly read-only.

Now I would like to verify storage sector by sector but not sure what would it be for Linux ext4.

I also did small tweaks on journalling in mongo and FS as guided for SSD drive.
But this should not influence non interrupted work on mongo as I understand and this is case here. I'm sure mongo instance was not interrupted based on machine uptime untill occurrence.

Comment by Danny Hatcher (Inactive) [ 25/Oct/19 ]

clydzik@wp.pl, we are still investigating this problem but we believe this issue may have been caused by disk corruption. Did the server in question experience any other issues around the time of the assertion? Have you seen the issue on other servers or just one?

Comment by krzysztof osmulski [ 14/Oct/19 ]

during mongod --verify found such log:

2019-10-14T20:27:08.648+0200 I STORAGE [initandlisten] Invalid BSON detected at RecordId(36455202): InvalidBSON: not null terminated string in element with field name 'url' in object with _id: "7870558706". Deleting.

 

not sure if this is related

Generated at Thu Feb 08 05:04:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.