Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-11543

A new explanation of "uptime" in replSetGetStatus

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: manual, Server
    • Labels:
      None

      Description

      I recently discovered I have had the wrong impression about the uptime values in rs.status() (i.e. replSetGetStatus) output for a very long time (~3 years).

      I thought the uptime was of the unix/windows process but this is only true for the uptime of the "self": true node, i.e. the node you execute rs.status() on.

      For the other nodes it is the span of time since the first heartbeat returned from them. So if you restart a node and then run rs.status() on it the uptimes it reports will be reset from zero. But from other nodes they will have higher uptimes and only the restarted node has the small uptime.

      The manual replSetGetStatus page currently says:

      replSetGetStatus.members[n].uptime

      The uptime field holds a value that reflects the number of seconds that this member has been online.

      This value does not appear for the member that returns the rs.status() data.

      The "been online" description is vague, and its easy to see rs.status() output that accidentally affirms it means the common idea of uptime, i.e. of a process. Instead the manual should convey it starts with the heartbeat initiation logic and there is the context that it is relative to the member you're executing on.

      I also see the second line is wrong. The member that returns the rs.status() data shows an uptime too, since 3.2 for certain or maybe even earlier. Of course a node doesn't have heartbeat data with itself, but in the code (ReplicationCoordinatorImpl::processReplSetGetStatus) I see it is calculated as 'now - serverGlobalParams.started'.

      I suggest the description be changed to the following.

      The uptime field shows how long heartbeats have been established to that node. For the member the command is being run on there is no heartbeat data so the time since last restart is displayed instead.

      The value will reset when the process is restarted, so a node that was restarted an hour ago will report 3600 for itself and <= 3600 values for the other nodes it has connected to since restart.

        Attachments

          Activity

            People

            Assignee:
            ravind.kumar Ravind Kumar (Inactive)
            Reporter:
            akira.kurogane Akira Kurogane
            Participants:
            Last commenter:
            Kay Kim (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Days since reply:
              2 years, 18 weeks, 5 days ago
              Date of 1st Reply: