Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-11543

A new explanation of "uptime" in replSetGetStatus



    • Type: Improvement
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: manual, Server
    • Labels:


      I recently discovered I have had the wrong impression about the uptime values in rs.status() (i.e. replSetGetStatus) output for a very long time (~3 years).

      I thought the uptime was of the unix/windows process but this is only true for the uptime of the "self": true node, i.e. the node you execute rs.status() on.

      For the other nodes it is the span of time since the first heartbeat returned from them. So if you restart a node and then run rs.status() on it the uptimes it reports will be reset from zero. But from other nodes they will have higher uptimes and only the restarted node has the small uptime.

      The manual replSetGetStatus page currently says:


      The uptime field holds a value that reflects the number of seconds that this member has been online.

      This value does not appear for the member that returns the rs.status() data.

      The "been online" description is vague, and its easy to see rs.status() output that accidentally affirms it means the common idea of uptime, i.e. of a process. Instead the manual should convey it starts with the heartbeat initiation logic and there is the context that it is relative to the member you're executing on.

      I also see the second line is wrong. The member that returns the rs.status() data shows an uptime too, since 3.2 for certain or maybe even earlier. Of course a node doesn't have heartbeat data with itself, but in the code (ReplicationCoordinatorImpl::processReplSetGetStatus) I see it is calculated as 'now - serverGlobalParams.started'.

      I suggest the description be changed to the following.

      The uptime field shows how long heartbeats have been established to that node. For the member the command is being run on there is no heartbeat data so the time since last restart is displayed instead.

      The value will reset when the process is restarted, so a node that was restarted an hour ago will report 3600 for itself and <= 3600 values for the other nodes it has connected to since restart.




            ravind.kumar Ravind Kumar (Inactive)
            akira.kurogane Akira Kurogane
            Last commenter:
            Kay Kim (Inactive)
            0 Vote for this issue
            2 Start watching this issue


              Days since reply:
              2 years, 18 weeks, 5 days ago
              Date of 1st Reply: