[SERVER-33964] Replica set‘’s secondary memeber's optimeData sometimes does not update Created: 19/Mar/18  Updated: 23/Apr/18  Resolved: 27/Mar/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.4.13
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: ken Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File primary.log     JPEG File rs.printSlaveReplicationInfo.JPG     JPEG File rs.status1.JPG     JPEG File rs.status2.JPG     Text File secondary.log    
Operating System: ALL
Participants:

 Description   

I have a replica set deployed, Primary, Secondary, Arbiter.
and secondary server is deployed in an unstable server, sometimes, if the secondary server is stuck and down, . after the unstable server and mongo is resumed automatically, the optimeData(data) is no longer to sync Primary's optimeData(data), and the secondary mongo's all status looks good.
please find attached log on secondary server, and rs.status



 Comments   
Comment by Kelsey Schubert [ 27/Mar/18 ]

Hi kenyang001,

Examining the logs, I see that there was significant network connectivity issues starting at around 2018-03-25T19:09 UTC, which were resolved around 2018-03-26T01:23 UTC. However, the last optime appears in the middle of this issue at 2018-03-25T21:50 UTC. This indicates to me that most likely the secondary is successfully applying operations replicated from the primary since the event, but has not yet fully caught up to the primary following the network connectivity problems.

I see log lines like:

[NetworkInterfaceASIO-Replication-0] Failed to connect to a_domain_address:27018 - HostUnreachable: No route to host

Consequently, my advice would be to work to stabilize this host/network.

Kind regards,
Kelsey

Comment by ken [ 27/Mar/18 ]

@Ramon Fernandez logs attached. please help have a look? thanks.

Comment by ken [ 26/Mar/18 ]

Hi Ramon,

it happens again, and i uploaded required attachments, please kindly help check it? thanks.

Comment by ken [ 20/Mar/18 ]

Hi Ramon,

thanks for your feedback , i will upload requested information once it happens again. i think it will happen in 2 days again, and i will upload it at that time. thanks.

Comment by Ramon Fernandez Marina [ 19/Mar/18 ]

From the data you provided, it seems this secondary node can't establish a reliable connection to the primary to sync. If you can upload the full output of rs.status() and rs.printSlaveReplicationInfo() as well as full logs for the primary and secondary we may be able to tell more.

Regards,
Ramón.

Generated at Thu Feb 08 04:35:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.