[SERVER-31758] Incorrect uptime reported in rs.status() by recently restarted node Created: 28/Oct/17 Updated: 27/Oct/23 Resolved: 25/Jan/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | dabest1 | Assignee: | Spencer Brody (Inactive) |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Amazon Linux |
||
| Operating System: | ALL |
| Participants: |
| Description |
|
After restart of `mongod` on secondary node, `rs.status()` is reporting incorrect uptime of other nodes. Primary node is reporting correct uptime. |
| Comments |
| Comment by Spencer Brody (Inactive) [ 25/Jan/18 ] | |||||
|
I do not think there is a bug here, this is the intended behavior. replSetGetStatus reports what the node it is run against believes the state of the other nodes to be, it is fundamentally about reflecting one node's view of the world, which is not necessarily the correct view of the world as observed by a perfect 'god' observer. In this case the node that the command was run against never saw the node that was restarted as down, thus it reporting a high uptime for it is correct. | |||||
| Comment by Mark Agarunov [ 12/Jan/18 ] | |||||
|
Hello MongoDBA_IRE, Thank you for providing this information. As detailed by Spencer in the previous reply, this appears to be due to the node restarting so quickly that other nodes never see it as down. I've set the fixVersion on this ticket to "Needs Triage" to be planned against our currently scheduled work. Please follow this ticket for updates on this issue. Thanks, | |||||
| Comment by Spencer Brody (Inactive) [ 10/Jan/18 ] | |||||
|
From mongod_802.log:
...
...
The whole restart happens in less than a second. That means the primary most likely just never actually noticed that that secondary was down, since it wouldn't ever have gone longer than the heartbeat timeout without responding. Looking at the logs of the primary supports this hypothesis, there are no messages indicating it ever considers node 802 down. The important thing to note is that the 'uptime' reported in replSetGetStatus for other nodes is the amount of time that has passed since the last time this node considered the other node down. It is thus likely to be different than what the other node would report as its own uptime. | |||||
| Comment by CiaranOS [X] [ 12/Dec/17 ] | |||||
|
Hi, Done as adiagnostic.data.tar.gz Thank you, | |||||
| Comment by Mark Agarunov [ 11/Dec/17 ] | |||||
|
Hello MongoDBA_IRE, Thank you for providing these logs. I've reopened the ticket and am investigating the issue. If possible, could you also archive (tar or zip) the $dbpath/diagnostic.data folder and attach it to this ticket? This contains some addition debugging information that may be helpful in diagnosing this issue. Thanks, | |||||
| Comment by CiaranOS [X] [ 11/Dec/17 ] | |||||
|
HI, Separate/Independent of previous poster, I too am experiencing this issue. MongoDB v3.4.7 I used the secure link above to upload logs. If I restart the secondary Instance the uptime does not change basically. Cheers, | |||||
| Comment by Mark Agarunov [ 01/Dec/17 ] | |||||
|
Hello dabest1, We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket. Thanks, | |||||
| Comment by Mark Agarunov [ 10/Nov/17 ] | |||||
|
Hello dabest1, We still need additional information to diagnose the problem. If this is still an issue for you, would you please:
Thanks, | |||||
| Comment by Mark Agarunov [ 30/Oct/17 ] | |||||
|
Hello dabest1, Thank you for the response. I've generated a secure upload portal so that you can send us these files privately. Thanks, | |||||
| Comment by dabest1 [ 30/Oct/17 ] | |||||
|
Hello Mark, Is there a private upload feature? Where I would be able to upload the files and they are not publicly visible. Thanks. | |||||
| Comment by Mark Agarunov [ 30/Oct/17 ] | |||||
|
Hello dabest1, Thank you for the report. To get a better idea of why this may be happening, I'd like to request some additional information:
This should give us a better insight into what may be causing the uptime to be reported incorrectly. Thanks, |