[SERVER-73577] Instance in Recovering State, Initial Sync Fails Created: 03/Feb/23  Updated: 03/Feb/23  Resolved: 03/Feb/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: İlker Demirci Assignee: Yuan Fang
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: JPEG File error1.jpg     JPEG File error2.jpg    
Operating System: ALL
Participants:

 Description   

Hello,

I have a mongo cluster with 3 instances, 1 of the instances updated its state as Recovering, when i checked the mongod.log i saw an error like "non-specific WireTiger error", here is an image of it:

 

I deleted the data directory and attempted to start an initial sync after this issue but the sync had intrupted with another error which was : "Restarting oplog query due to error: NetworkInterfaceExceededTimeLimit: error in fetcher batch callback", image:

It started the sync progress from the start, it has been almost 4 days but it's still at startup state.

 

The data size is around 650 gigs, after the copying and indexing has been finished, it has been doing oplog operation for 2 days now. It's trying to catch up to the cluster, beacuse of it's been behind of the cluster for several days, the oplog section is taking too long.

I am trying to understand why it updated its state as recovering, is it because the data did corrupt somehow? 

By the way, this member of the cluster has done this error more than once, the other members are doing just fine. Even though i sync this member to the others in the end, it repeats this error. 

 

Is there a specific reason to repeat this kinda error?

 

 

 



 Comments   
Comment by Yuan Fang [ 03/Feb/23 ]

Hi ilker.demirci@netmera.com,

Thank you for reporting the issue. The error message displayed in the first screenshot appears to suggest some form of physical corruption.  The answer in the MongoDB developer community forum provides helpful information on how to recover from data file corruption. If you have any questions, I highly recommend starting there. If the discussion there leads you to suspect a bug in the MongoDB server, please let us know so that we can investigate it further in the SERVER project.

Regards,
Yuan

Generated at Thu Feb 08 06:25:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.