[SERVER-38359] Different Uptimes shown in rs.status() across diferent network locations when there are latency issyes Created: 03/Dec/18  Updated: 03/Dec/18  Resolved: 03/Dec/18

Status: Closed
Project: Core Server
Component/s: Admin
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Karthick [X] Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

 

Hi Team,

we have 6 nodes replica set
1. 3 nodes in one geographic location say Location 1 .
2. 3 other non votable members different geographic location for the purpose of Disaster Recovery say Location 2.

Problem:

Whenever there was a network glitch or latency between these geographic locations , we see that the DR nodes on Location 2 are unable to reach the other 3 nodes on Location 1 and vice versa from the logs
saying "unable to reach the host ip address / time out" .

 

The problem is
1. when we do rs.status () from the Node 1 , Primary from Location 1 , it shows the "up times" of all the DR nodes on Location 2 say 1600s.

2. when we do rs.status () from the Node 4 , secondary from Location 2 , it shows the "up times" of all the Primary nodes on Location 1 say 1600s.

Actually all the nodes are up and running more than a year with out any downtime. The Question is why the "UPTIMES" is shown different , when rs.status is evaluated from 2 different geo locations , when none of the nodes are down.

Please suggest and request to help us to understand why we see this difference in UPTIMES..

 



 Comments   
Comment by Danny Hatcher (Inactive) [ 03/Dec/18 ]

Hello Karthick,

The uptime field you are referencing simply references to the length of time that one node has had an uninterrupted view of another node. Network issues can interrupt the back-and-forth heartbeats between nodes which can cause uptime to start over from 0. There is no cause for concern if this number is relatively low as long as your cluster is otherwise stable.

Thank you,

Danny

Generated at Thu Feb 08 04:48:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.