[SERVER-82314] e storage [wtcheckpointthread] wiredtiger error(9) Created: 19/Oct/23  Updated: 08/Jan/24  Resolved: 08/Jan/24

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: srini bijjam Assignee: Edwin Zhou
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Azure VM using Linux OS.


Operating System: ALL
Participants:

 Description   

Mongod service is going down unexpectedly with below error.

We built this DB on Azure VM a couple of days back.

 

2023-10-18T11:52:39.375+0000 I NETWORK  [listener] connection accepted from 168.63.129.16:60101 #2597 (2 connections now open)

2023-10-18T11:52:43.573+0000 E STORAGE  [WTCheckpointThread] WiredTiger error (9) [1697629963:573590][88800:0x7f51eba1a700], file:index-10298-3638152389526840726.wt, WT_SESSION.checkpoint: __posix_sync, 108: /opt/app/mcap/index-10298-3638152389526840726.wt: handle-sync: fdatasync: Bad file descriptor Raw: [1697629963:573590][88800:0x7f51eba1a700], file:index-10298-3638152389526840726.wt, WT_SESSION.checkpoint: __posix_sync, 108: /opt/app/mcap/index-10298-3638152389526840726.wt: handle-sync: fdatasync: Bad file descriptor

2023-10-18T11:52:43.573+0000 E STORAGE  [WTCheckpointThread] WiredTiger error (-31804) [1697629963:573756][88800:0x7f51eba1a700], file:index-10298-3638152389526840726.wt, WT_SESSION.checkpoint: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1697629963:573756][88800:0x7f51eba1a700], file:index-10298-3638152389526840726.wt, WT_SESSION.checkpoint: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic

2023-10-18T11:52:43.573+0000 F -        [WTCheckpointThread] Fatal Assertion 50853 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 420

2023-10-18T11:52:43.573+0000 F -        [WTCheckpointThread] \n\n***aborting after fassert() failure\n\n

2023-10-18T11:52:43.587+0000 F -        [WTCheckpointThread] Got signal: 6 (Aborted).

 



 Comments   
Comment by Edwin Zhou [ 08/Jan/24 ]

We haven’t heard back from you for some time, so I’m going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Comment by Edwin Zhou [ 13/Dec/23 ]

We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide the diagnostics I requested in my previous comment? Here is an updated upload portal link.

Comment by Edwin Zhou [ 27/Oct/23 ]

Hi bijjamsrini@gmail.com,

Thank you for your report. To proceed further with this investigation, we will need additional diagnostics. I've created a secure upload portal for you. Files uploaded to this portal are hosted on Box, are visible only to MongoDB employees, and are routinely deleted after some time.

For each node in the replica set spanning a time period that includes the incident, would you please archive (tar or zip) and upload to that link:

  • the mongod logs
  • the $dbpath/diagnostic.data directory (the contents are described here)

In addition, can you please provide the MongoDB version used when hitting this issue?

Kind regards,
Edwin

Generated at Thu Feb 08 06:48:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.