[SERVER-37820] Restarting oplog query due to error: NetworkInterfaceExceededTimeLimit: error in fetcher batch callback: Operation timed out Created: 30/Oct/18  Updated: 31/Oct/18  Resolved: 30/Oct/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.6.8
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: gen Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File metrics.2018-10-30T01-09-14Z-00000     Text File mongod.log    
Operating System: ALL
Steps To Reproduce:

1) standalone instance with ~40GB data

2) setup initiate it as replica set, and play primary role

3) add secondary machine into this replica set

4) the secondary stuck in STARTUP2 for a few days.

5) eventually secondary exit with a fatal error.

Participants:

 Description   

Secondary node failed to sync data and crashed

 

attached files include secondary mongod.log and diagnostic.data file. hope these helps

my system is centOS 6 2Core CPU 4GB Ram

 



 Comments   
Comment by gen [ 31/Oct/18 ]

Hi Danny,

after upgrading memory to 8GB, and restart mongod (without deleting files in db path) the secondary node turned into SECONDARY successfully.

it looks like the insufficient HW resource casued this problem, now the secondary node works correctly.

thanks for you help.

Comment by Danny Hatcher (Inactive) [ 31/Oct/18 ]

Hello,

could i say this mongod process killed by CPU usage limitation in my case? could advise what CPU should i use for this situation?

Unfortunately, it appears there was some other activity on the server that caused the mongod to crash. Because I am not aware of what the other activity was, I cannot give a good estimate on what CPU to use.

If I restart mongod, the sync procedure will continue?

I would recommend fully deleting everything in the data directory and running a clean initial sync. That will ensure that there is no corruption as a result of the crash.

Thank you,

Danny

Comment by gen [ 31/Oct/18 ]

Thanks Daniel for your quick response. could i say this mongod process killed by CPU usage limitation in my case? could advise what CPU should i use for this situation? 

 

If I restart mongod, the sync procedure will continue?

 

Comment by Danny Hatcher (Inactive) [ 30/Oct/18 ]

Hello,

From a look through the logs and diagnostics, it appears that there was a sudden spike in CPU usage and reads from the disk named vda. This appears to be unrelated to MongoDB itself as it was still replicating so I would recommend investigating what other processes may have been involved on the server at that time.

As I do not see anything to indicate a bug in the MongoDB server, I will be closing this ticket. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group.

See also our Technical Support page for additional support resources.

Thank you,

Danny

Generated at Thu Feb 08 04:47:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.