Uploaded image for project: 'MongoDB ETL Tools'
  1. MongoDB ETL Tools
  2. TOOLS-1628

mongostat should use server time when computing rates, not client time

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Accepted
    • Priority: Minor - P4
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: mongostat
    • Labels:
      None

      Description

      By default, for metrics like inserts, queries, etc, mongostat shows values that are rates (rather than plain diffs). That is, it shows the delta of the metric divided by the delta in time.

      However, when doing this mongostat uses the local client wall time (time.Now()) after the serverStatus response has been received (code). If there has been some delay in receiving this response (eg. network glitch, server under heavy load), then this time will not be when the statistics were actually collected by the server. This will cause the computed rates to be inaccurate — because the time delta (on the denominator of the division) doesn't correspond to the metrics delta (on the numerator). As a result the mongostat output is not be a true reflection of server activity, which could lead to false conclusions and misdiagnosis.

      Fortunately, the server returns the time at which the metrics were collected (more or less) in the localTime field of the serverStatus response. mongostat should use the value of the localTime field for the "time" column and when computing rates.


      In the output below, note that the "time" values are a few millis behind serverStatus.localTime, and it isn't completely consistent — usually it's 2ms, but it can be anywhere between 1ms and 5ms.

      $ mongostat --version
      mongostat version: r3.4.2
      git version: 3f76e40c105fc223b3e5aac3e20dcd026b83b38b
      Go version: go1.7
         os: linux
         arch: amd64
         compiler: gc
      OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
      $ mongostat -o 'time,localTime'
                     time                          localTime
      Mar 10 17:01:18.016 2017-03-10 17:01:18.014 +1100 AEDT
      Mar 10 17:01:19.016 2017-03-10 17:01:19.014 +1100 AEDT
      Mar 10 17:01:20.016 2017-03-10 17:01:20.014 +1100 AEDT
      Mar 10 17:01:21.019 2017-03-10 17:01:21.015 +1100 AEDT
      Mar 10 17:01:22.016 2017-03-10 17:01:22.014 +1100 AEDT
      Mar 10 17:01:23.016 2017-03-10 17:01:23.014 +1100 AEDT
      Mar 10 17:01:24.016 2017-03-10 17:01:24.015 +1100 AEDT
      Mar 10 17:01:25.019 2017-03-10 17:01:25.014 +1100 AEDT
      Mar 10 17:01:26.016 2017-03-10 17:01:26.014 +1100 AEDT
      Mar 10 17:01:27.016 2017-03-10 17:01:27.015 +1100 AEDT
                     time                          localTime
      Mar 10 17:01:28.016 2017-03-10 17:01:28.014 +1100 AEDT
      Mar 10 17:01:29.016 2017-03-10 17:01:29.014 +1100 AEDT
      Mar 10 17:01:30.019 2017-03-10 17:01:30.015 +1100 AEDT
      Mar 10 17:01:31.016 2017-03-10 17:01:31.015 +1100 AEDT
      Mar 10 17:01:32.016 2017-03-10 17:01:32.014 +1100 AEDT
      Mar 10 17:01:33.016 2017-03-10 17:01:33.014 +1100 AEDT
      Mar 10 17:01:34.017 2017-03-10 17:01:34.014 +1100 AEDT
      Mar 10 17:01:35.016 2017-03-10 17:01:35.014 +1100 AEDT
      Mar 10 17:01:36.016 2017-03-10 17:01:36.015 +1100 AEDT
      Mar 10 17:01:37.016 2017-03-10 17:01:37.014 +1100 AEDT
      


      Here is an actual output showing the problem with a heavily loaded mongod. The server is doing around 45K queries per second. While the serverStatus response is evenly spaced by about 1 second, the rate is fine.

      However, the serverStatus response received at 11:24:13.774 has been delayed by about 1 second. This causes the query rate to be falsely reported as about half what it actually is, or 22K/s.

      Then, the subsequent response is received by mongostat very soon afterwards, at 11:24:13.785, which is only 11ms later. Thus, mongostat computes (and displays) a query rate of (about) 45K/0.01 = 4.5M/s.

      This is almost certainly a pure artifact of this mongostat problem — the server is extremely likely to have just been doing ~45k/s queries throughout this whole time, but that is not what mongostat makes it look like.

      The same can be seen for all the other differentiable metrics, such as insert, update, getmore, command, net_in, and net_out.

      (h/t to Dmitry Agranat for noticing these implausible numbers in the first place)

        Attachments

          Activity

            People

            • Assignee:
              backlog-server-tools Backlog - Server Tools Team
              Reporter:
              kevin.pulo Kevin Pulo
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated: