[SERVER-39393] WiredTiger.turtle: encountered an illegal file format or internal value Created: 06/Feb/19  Updated: 19/Mar/19  Resolved: 19/Mar/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.6.4
Fix Version/s: None

Type: Question Priority: Critical - P2
Reporter: Andrew Perella Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows10 Pro x64


Attachments: File mongod.2019-02-05T19-33-13.mdmp     Text File mongodb.log    
Issue Links:
Related
is related to WT-4527 (MongoDB-3.4.14) Investigate if failu... Closed
Participants:

 Description   

Mongo 3.6.4 crashed after 5 hours of light operation.

2019-02-05T19:33:13.782+0000 E STORAGE [WTCheckpointThread] WiredTiger error (13) [1549395193:781511][130620:140706046763344], file:WiredTiger.wt, WT_SESSION.checkpoint: data\WiredTiger.turtle.set to data\WiredTiger.turtle: file-rename: MoveFileExW: Access is denied.
: Permission denied
2019-02-05T19:33:13.784+0000 E STORAGE [WTCheckpointThread] WiredTiger error (0) [1549395193:783517][130620:140706046763344], file:WiredTiger.wt, WT_SESSION.checkpoint: WiredTiger.turtle: encountered an illegal file format or internal value: (__wt_turtle_update, 337)
2019-02-05T19:33:13.784+0000 E STORAGE [WTCheckpointThread] WiredTiger error (-31804) [1549395193:783517][130620:140706046763344], file:WiredTiger.wt, WT_SESSION.checkpoint: the process must exit and restart: WT_PANIC: WiredTiger library panic
2019-02-05T19:33:13.784+0000 F - [WTCheckpointThread] Fatal Assertion 28558 at src\mongo\db\storage\wiredtiger\wiredtiger_util.cpp 361
2019-02-05T19:33:13.784+0000 F - [WTCheckpointThread]

***aborting after fassert() failure



 Comments   
Comment by Danny Hatcher (Inactive) [ 19/Mar/19 ]

Hello Andrew,

Thanks for letting us know. I'll close this ticket now.

Have a great day,

Danny

Comment by Andrew Perella [ 19/Mar/19 ]

Yes, these are easy to exclude thanks.

Comment by Danny Hatcher (Inactive) [ 22/Feb/19 ]

Hello Andrew,

That's great to hear! I'm not familiar with Tortoise; would you be able to exclude the MongoDB directories from its actions? If so, that should be a relatively easy solution.

Thanks,

Danny

Comment by Andrew Perella [ 21/Feb/19 ]

It does look like it might have been TSVNCache.exe - part of Tortoise SVN. After disabling this I have had no more crashes yet.

Comment by Andrew Perella [ 20/Feb/19 ]

I am now running sysinternals procmon to monitor file access - hopefully it will confirm if this is the problem,

Comment by Danny Hatcher (Inactive) [ 20/Feb/19 ]

Hello Andrew,

I've spoken with our WiredTiger engineers and they've only ever seen this failure from two causes:

  • A different process is accessing database files
  • An odd filesystem has transient access errors

If you're sure that there is no other process on the server that could be accessing the files, could you be using a non-standard filesystem?

Thank you,

Danny

Comment by Andrew Perella [ 12/Feb/19 ]

have attached mongdb.log for a 4.0.5 build crash

I am inserting docs like:

{
"_id" : ObjectId("5c5d745755d0d894e649911e"),
"state" : "good",
"tid" : "SYSTEM",
"pid" : "SYS",
"name" : "cpu%",
"ph" : "C",
"ts" : 1549628503830.0,
"args" :

{ "v" : 11.1 }

,
"t" : 1549628503839.0
}

 

at a rate of maybe 10 a second typically.

 

Comment by Danny Hatcher (Inactive) [ 11/Feb/19 ]

Hello Andrew,

Could you please provide the mongod log file covering the timeframe as well as the steps you ran as part of your test?

Thank you,

Danny

Comment by Andrew Perella [ 08/Feb/19 ]

So it doesn't appear to be the virus checker at least - I ran a test with this disabled and this still crashed ( v 4.0.5 )

 

Comment by Andrew Perella [ 08/Feb/19 ]

Thanks Danny - I will run some tests with this disabled - though this is not a solution for many of our use cases. Ideally if a file needs to be renamed and it cannot it would be better to wait (and even blocking for a few seconds is better than crashing) and retry rather than crashing. I have seen many processes under windows locking files from virus checkers, backup services, shell integrations (eg tortoise svn etc will do this) so this should not be seen as unusual behaviour.

It may be interesting to noteI have tried using mongo 3.0.15 and it hasn't crashed in 24 hours so far whilst later versions have crashed within a few hours.  I am now running tests fanning out inserts to multiple versions of mongodb to see what that shows.

 

 

Comment by Danny Hatcher (Inactive) [ 07/Feb/19 ]

Hello Andrew,

As you mentioned in one of your comments, anti-virus software has been known to cause this specific error in the past. Unfortunately, due to the way Windows treats its files, we can try to avoid the problem as much as possible but never completely "solve" it. If you remove any and all anti-virus software from the server having the issue or run the same configuration on a different server with no anti-virus, do you still see these errors?

Thank you,

Danny

Comment by Andrew Perella [ 06/Feb/19 ]

It does look like the claimed fix in https://jira.mongodb.org/browse/WT-3962

 

Comment by Andrew Perella [ 06/Feb/19 ]

A colleague has had a similar issue on his windows machine now with a different use case. I wonder if this might be a virus checker issue.

Comment by Andrew Perella [ 06/Feb/19 ]

I upgraded to version 4.0.5 and got almost the same crash after a few more hours: This was the log:

2019-02-06T14:01:27.464+0000 E STORAGE [WTCheckpointThread] WiredTiger error (13) [1549461687:463428][152316:140706046763344], file:WiredTiger.wt, WT_SESSION.checkpoint: __win_fs_rename, 105: data\WiredTiger.turtle.set to data\WiredTiger.turtle: file-rename: MoveFileExW: Access is denied.
: Permission denied Raw: [1549461687:463428][152316:140706046763344], file:WiredTiger.wt, WT_SESSION.checkpoint: __win_fs_rename, 105: data\WiredTiger.turtle.set to data\WiredTiger.turtle: file-rename: MoveFileExW: Access is denied.
: Permission denied
2019-02-06T14:01:27.464+0000 E STORAGE [WTCheckpointThread] WiredTiger error (13) [1549461687:464431][152316:140706046763344], file:WiredTiger.wt, WT_SESSION.checkpoint: __wt_turtle_update, 397: WiredTiger.turtle: fatal turtle file update error: Permission denied Raw: [1549461687:464431][152316:140706046763344], file:WiredTiger.wt, WT_SESSION.checkpoint: __wt_turtle_update, 397: WiredTiger.turtle: fatal turtle file update error: Permission denied
2019-02-06T14:01:27.464+0000 E STORAGE [WTCheckpointThread] WiredTiger error (-31804) [1549461687:464431][152316:140706046763344], file:WiredTiger.wt, WT_SESSION.checkpoint: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1549461687:464431][152316:140706046763344], file:WiredTiger.wt, WT_SESSION.checkpoint: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic
2019-02-06T14:01:27.465+0000 F - [WTCheckpointThread] Fatal Assertion 50853 at src\mongo\db\storage\wiredtiger\wiredtiger_util.cpp 409
2019-02-06T14:01:27.465+0000 F - [WTCheckpointThread]

***aborting after fassert() failure

 

I have also uploaded a mindump for this 

 mongod.2019-02-06T14-01-27.v4.0.5.mdmp

 

Comment by Andrew Perella [ 06/Feb/19 ]

This may be related to 

  1. WT-4143
Generated at Thu Feb 08 04:51:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.