[SERVER-31758] Incorrect uptime reported in rs.status() by recently restarted node Created: 28/Oct/17  Updated: 27/Oct/23  Resolved: 25/Jan/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: dabest1 Assignee: Spencer Brody (Inactive)
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Amazon Linux


Operating System: ALL
Participants:

 Description   

After restart of `mongod` on secondary node, `rs.status()` is reporting incorrect uptime of other nodes. Primary node is reporting correct uptime.



 Comments   
Comment by Spencer Brody (Inactive) [ 25/Jan/18 ]

I do not think there is a bug here, this is the intended behavior. replSetGetStatus reports what the node it is run against believes the state of the other nodes to be, it is fundamentally about reflecting one node's view of the world, which is not necessarily the correct view of the world as observed by a perfect 'god' observer. In this case the node that the command was run against never saw the node that was restarted as down, thus it reporting a high uptime for it is correct.

Comment by Mark Agarunov [ 12/Jan/18 ]

Hello MongoDBA_IRE,

Thank you for providing this information. As detailed by Spencer in the previous reply, this appears to be due to the node restarting so quickly that other nodes never see it as down. I've set the fixVersion on this ticket to "Needs Triage" to be planned against our currently scheduled work. Please follow this ticket for updates on this issue.

Thanks,
Mark

Comment by Spencer Brody (Inactive) [ 10/Jan/18 ]

From mongod_802.log:

2017-12-11T11:42:51.089+0000 I CONTROL  [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends

...

2017-12-11T11:42:51.633+0000 I CONTROL  [initandlisten] shutting down with code:0
2017-12-11T11:42:52.021+0000 I CONTROL  [main] ***** SERVER RESTARTED *****
2017-12-11T11:42:52.038+0000 I CONTROL  [initandlisten] MongoDB starting : pid=7955 port=28122 dbpath=/mongo/data 64-bit host=extdubmongodb802.eu.dsext.susq.com

...

2017-12-11T11:42:52.610+0000 I REPL     [rsSync] transition to SECONDARY

The whole restart happens in less than a second. That means the primary most likely just never actually noticed that that secondary was down, since it wouldn't ever have gone longer than the heartbeat timeout without responding. Looking at the logs of the primary supports this hypothesis, there are no messages indicating it ever considers node 802 down.

The important thing to note is that the 'uptime' reported in replSetGetStatus for other nodes is the amount of time that has passed since the last time this node considered the other node down. It is thus likely to be different than what the other node would report as its own uptime.

Comment by CiaranOS [X] [ 12/Dec/17 ]

Hi,

Done as adiagnostic.data.tar.gz

Thank you,

Comment by Mark Agarunov [ 11/Dec/17 ]

Hello MongoDBA_IRE,

Thank you for providing these logs. I've reopened the ticket and am investigating the issue. If possible, could you also archive (tar or zip) the $dbpath/diagnostic.data folder and attach it to this ticket? This contains some addition debugging information that may be helpful in diagnosing this issue.

Thanks,
Mark

Comment by CiaranOS [X] [ 11/Dec/17 ]

HI,

Separate/Independent of previous poster, I too am experiencing this issue. MongoDB v3.4.7

I used the secure link above to upload logs.

If I restart the secondary Instance the uptime does not change basically.

Cheers,
Ciaran

Comment by Mark Agarunov [ 01/Dec/17 ]

Hello dabest1,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Thanks,
Mark

Comment by Mark Agarunov [ 10/Nov/17 ]

Hello dabest1,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please:

  • Please provide the output of rs.status() from all affected nodes
  • Please provide the complete log files from all affected nodes
  • Which version(s) of mongodb are being used?

Thanks,
Mark

Comment by Mark Agarunov [ 30/Oct/17 ]

Hello dabest1,

Thank you for the response. I've generated a secure upload portal so that you can send us these files privately.

Thanks,
Mark

Comment by dabest1 [ 30/Oct/17 ]

Hello Mark,

Is there a private upload feature? Where I would be able to upload the files and they are not publicly visible.

Thanks.

Comment by Mark Agarunov [ 30/Oct/17 ]

Hello dabest1,

Thank you for the report. To get a better idea of why this may be happening, I'd like to request some additional information:

  • Please provide the output of rs.status() from all affected nodes
  • Please provide the complete log files from all affected nodes
  • Which version(s) of mongodb are being used?

This should give us a better insight into what may be causing the uptime to be reported incorrectly.

Thanks,
Mark

Generated at Thu Feb 08 04:28:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.