[SERVER-52935] Failed to start node of Replica Set Created: 18/Nov/20  Updated: 23/Nov/20  Resolved: 23/Nov/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Giorgi Dvalishvili Assignee: Edwin Zhou
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Hello,

MongoDB version v3.2.6

 

I have MongoDB Replica Set with 3 members (primary, secondary and arbiter). primary node failed and after failover secondary became primary. after that I tried to start mongod process on failed node and receiving following Error:  "ERROR: child process failed, exited with error number 51".  

 

there is content of log file: 
E STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger (0) [1605537697:593366][3880:0x7fc823621700], file:collection-6--87924298142942209182.wt, WT_SESSION.truncate: read checksum error for 4096B block at offset 18014208: block header checksum of 1701601889 doesn't match expected checksum of 2449756693
E STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger (0) [1605537698:967296][3880:0x7fc823621700], file:collection-6--87924298142942209182.wt, WT_SESSION.truncate: collection-6--87924298142942209182.wt: encountered an illegal file format or internal value
E STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger (-31804) [1605537698:967317][3880:0x7fc823621700], file:collection-6--87924298142942209182.wt, WT_SESSION.truncate: the process must exit and restart: WT_PANIC: WiredTiger library panic
I - [WT RecordStoreThread: local.oplog.rs] Fatal Assertion 28558

 

 

I tried initial resync, too many hours, I receveid Error: 

W REPL [rsBackgroundSync] we are too stale to use hostname_pr_node:port_pr_node as a sync source
I REPL [ReplicationExecutor] could not find member to sync from
E REPL [rsBackgroundSync] too stale to catch up – entering maintenance mode
I REPL [rsBackgroundSync] our last optime : (term: -1, timestamp: Nov 16 18:41:34:1)
I REPL [rsBackgroundSync] oldest available is (term: -1, timestamp: Nov 16 19:17:21:34b)
I REPL [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember

 

I tried change oplog size, but I have only two nodes (primary and arbiter) and  I can't shutdown primary node.

Thank you



 Comments   
Comment by Edwin Zhou [ 23/Nov/20 ]

Hi gdvalishvili01@gmail.com,

I’d first like to note that 3.2 has reached EOL in September 2018, and encourage you to upgrade to a supported version when you can. Resizing the oplog does not require a restart in later versions.

You've attempted exactly what we would recommend for recovering from the first error. One option to perform an initial sync successfully may be to create a new node with greater resources (CPU/RAM), but we aren't able to advise about that here. The SERVER project is for reporting bugs and feature requests for supported MongoDB versions.

Please ask our MongoDB Developer Community Forums about potential options for bringing back up the failed node. Downtime may or may not be necessary depending on your specific circumstances.

Best,

Edwin

Generated at Thu Feb 08 05:29:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.