[SERVER-82314] e storage [wtcheckpointthread] wiredtiger error(9) Created: 19/Oct/23 Updated: 08/Jan/24 Resolved: 08/Jan/24 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | srini bijjam | Assignee: | Edwin Zhou |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Azure VM using Linux OS. |
||
| Operating System: | ALL |
| Participants: |
| Description |
|
Mongod service is going down unexpectedly with below error. We built this DB on Azure VM a couple of days back.
2023-10-18T11:52:39.375+0000 I NETWORK [listener] connection accepted from 168.63.129.16:60101 #2597 (2 connections now open) 2023-10-18T11:52:43.573+0000 E STORAGE [WTCheckpointThread] WiredTiger error (9) [1697629963:573590][88800:0x7f51eba1a700], file:index-10298-3638152389526840726.wt, WT_SESSION.checkpoint: __posix_sync, 108: /opt/app/mcap/index-10298-3638152389526840726.wt: handle-sync: fdatasync: Bad file descriptor Raw: [1697629963:573590][88800:0x7f51eba1a700], file:index-10298-3638152389526840726.wt, WT_SESSION.checkpoint: __posix_sync, 108: /opt/app/mcap/index-10298-3638152389526840726.wt: handle-sync: fdatasync: Bad file descriptor 2023-10-18T11:52:43.573+0000 E STORAGE [WTCheckpointThread] WiredTiger error (-31804) [1697629963:573756][88800:0x7f51eba1a700], file:index-10298-3638152389526840726.wt, WT_SESSION.checkpoint: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1697629963:573756][88800:0x7f51eba1a700], file:index-10298-3638152389526840726.wt, WT_SESSION.checkpoint: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic 2023-10-18T11:52:43.573+0000 F - [WTCheckpointThread] Fatal Assertion 50853 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 420 2023-10-18T11:52:43.573+0000 F - [WTCheckpointThread] \n\n***aborting after fassert() failure\n\n 2023-10-18T11:52:43.587+0000 F - [WTCheckpointThread] Got signal: 6 (Aborted).
|
| Comments |
| Comment by Edwin Zhou [ 08/Jan/24 ] |
|
We haven’t heard back from you for some time, so I’m going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket. |
| Comment by Edwin Zhou [ 13/Dec/23 ] |
|
We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide the diagnostics I requested in my previous comment? Here is an updated upload portal link. |
| Comment by Edwin Zhou [ 27/Oct/23 ] |
|
Thank you for your report. To proceed further with this investigation, we will need additional diagnostics. I've created a secure upload portal for you. Files uploaded to this portal are hosted on Box, are visible only to MongoDB employees, and are routinely deleted after some time. For each node in the replica set spanning a time period that includes the incident, would you please archive (tar or zip) and upload to that link:
In addition, can you please provide the MongoDB version used when hitting this issue? Kind regards, |