[SERVER-52530] Mongo v.4.4.1 crash - UnknownError -31803: WT_NOTFOUND Created: 31/Oct/20  Updated: 29/Oct/23  Resolved: 20/Jan/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.1
Fix Version/s: 4.4.3

Type: Question Priority: Major - P3
Reporter: Francesco Pepe Assignee: Dmitry Agranat
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Debian 10, CPU 48 core, RAM 192GB DDR4, nvme ssd, no high load when it happened.


Attachments: PNG File image-2020-11-05-10-23-25-571.png     PNG File image-2020-11-05-11-02-08-836.png     Text File mongo-crash_31-10-20.log    
Issue Links:
Related
is related to SERVER-50971 Invariant failure, WT_NOTFOUND: item ... Closed
is related to SERVER-50880 Mongod Server Failed with signal 6 Closed
Backwards Compatibility: Fully Compatible
Participants:
Case:

 Description   

Hi, my mongo instance suddenly crashed, can you please take a look at the log and help me to understand why it happened and how to avoid it?

By restarting the service it seems to work fine like before, it was running since 3 weeks without any problem..

Thanks, Regards



 Comments   
Comment by Dmitry Agranat [ 20/Jan/21 ]

Hi francescop85e@yahoo.it, I will go ahead and close this case as the reported issue was fixed in 4.4.3. Please reopen if you still experience the same issue after upgrading to 4.4.3.

Regards,
Dima

Comment by Dmitry Agranat [ 07/Jan/21 ]

Hi francescop85e@yahoo.it,

We've made some work in 4.4.3 trying to fix this issue. Would be possible for you to try 4.4.3 and provide us with feedback?

Thanks,
Dima

Comment by Luke Pearson [ 22/Nov/20 ]

Hi francescop85e@yahoo.it,

I'm sorry to hear that you're still experiencing these crashes frequently. I'd suggest downgrading to the latest version of 4.2 in the meantime. We're still working on a fix for this issue, which we believe occurs when a checkpoint is ahead of a reader/writer interacting with the history store, which can result in that reader/writer getting WT_NOTFOUND.

Comment by Francesco Pepe [ 22/Nov/20 ]

Please, can you suggest a safe version? Lately it's happening very often, cannot leave it like this

Comment by Francesco Pepe [ 19/Nov/20 ]

Thank you, we're evaluating a downgrade because yesterday it happened again and this is causing some problems. Can you please suggest a version to which downgrade to?

Comment by Dmitry Agranat [ 18/Nov/20 ]

Thanks for the update francescop85e@yahoo.it, we are currently discussing a possible fix to address this issue.

Comment by Francesco Pepe [ 16/Nov/20 ]

Good morning, yesterday it unexpectedly crashed again with the same error. I uploaded log files to the secure uploader

Comment by Francesco Pepe [ 03/Nov/20 ]

Hi,
thanks for your feedback. I uploaded the diagnostic file till the crash, the next one after restart and a view about used resources when it happened.

Regards, Francesco

Comment by Dmitry Agranat [ 01/Nov/20 ]

Hi francescop85e@yahoo.it,

We've started looking at this issue. Do you happen to have a core dump from this event? If you do, please upload it to the secure uploader.

Can you also upload a full mongod log and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to the same secure uploader location?

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thanks,
Dima

Generated at Thu Feb 08 05:28:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.