-
Type: Bug
-
Resolution: Unresolved
-
Priority: Minor - P4
-
None
-
Affects Version/s: None
-
Component/s: mongostat
-
None
-
1,927
-
v3.4, v3.2, v3.0
-
(copied to CRM)
By default, for metrics like inserts, queries, etc, mongostat shows values that are rates (rather than plain diffs). That is, it shows the delta of the metric divided by the delta in time.
However, when doing this mongostat uses the local client wall time (time.Now()) after the serverStatus response has been received (code). If there has been some delay in receiving this response (eg. network glitch, server under heavy load), then this time will not be when the statistics were actually collected by the server. This will cause the computed rates to be inaccurate — because the time delta (on the denominator of the division) doesn't correspond to the metrics delta (on the numerator). As a result the mongostat output is not be a true reflection of server activity, which could lead to false conclusions and misdiagnosis.
Fortunately, the server returns the time at which the metrics were collected (more or less) in the localTime field of the serverStatus response. mongostat should use the value of the localTime field for the "time" column and when computing rates.
In the output below, note that the "time" values are a few millis behind serverStatus.localTime, and it isn't completely consistent — usually it's 2ms, but it can be anywhere between 1ms and 5ms.
$ mongostat --version mongostat version: r3.4.2 git version: 3f76e40c105fc223b3e5aac3e20dcd026b83b38b Go version: go1.7 os: linux arch: amd64 compiler: gc OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016 $ mongostat -o 'time,localTime' time localTime Mar 10 17:01:18.016 2017-03-10 17:01:18.014 +1100 AEDT Mar 10 17:01:19.016 2017-03-10 17:01:19.014 +1100 AEDT Mar 10 17:01:20.016 2017-03-10 17:01:20.014 +1100 AEDT Mar 10 17:01:21.019 2017-03-10 17:01:21.015 +1100 AEDT Mar 10 17:01:22.016 2017-03-10 17:01:22.014 +1100 AEDT Mar 10 17:01:23.016 2017-03-10 17:01:23.014 +1100 AEDT Mar 10 17:01:24.016 2017-03-10 17:01:24.015 +1100 AEDT Mar 10 17:01:25.019 2017-03-10 17:01:25.014 +1100 AEDT Mar 10 17:01:26.016 2017-03-10 17:01:26.014 +1100 AEDT Mar 10 17:01:27.016 2017-03-10 17:01:27.015 +1100 AEDT time localTime Mar 10 17:01:28.016 2017-03-10 17:01:28.014 +1100 AEDT Mar 10 17:01:29.016 2017-03-10 17:01:29.014 +1100 AEDT Mar 10 17:01:30.019 2017-03-10 17:01:30.015 +1100 AEDT Mar 10 17:01:31.016 2017-03-10 17:01:31.015 +1100 AEDT Mar 10 17:01:32.016 2017-03-10 17:01:32.014 +1100 AEDT Mar 10 17:01:33.016 2017-03-10 17:01:33.014 +1100 AEDT Mar 10 17:01:34.017 2017-03-10 17:01:34.014 +1100 AEDT Mar 10 17:01:35.016 2017-03-10 17:01:35.014 +1100 AEDT Mar 10 17:01:36.016 2017-03-10 17:01:36.015 +1100 AEDT Mar 10 17:01:37.016 2017-03-10 17:01:37.014 +1100 AEDT
Here is an actual output showing the problem with a heavily loaded mongod. The server is doing around 45K queries per second. While the serverStatus response is evenly spaced by about 1 second, the rate is fine.
However, the serverStatus response received at 11:24:13.774 has been delayed by about 1 second. This causes the query rate to be falsely reported as about half what it actually is, or 22K/s.
Then, the subsequent response is received by mongostat very soon afterwards, at 11:24:13.785, which is only 11ms later. Thus, mongostat computes (and displays) a query rate of (about) 45K/0.01 = 4.5M/s.
This is almost certainly a pure artifact of this mongostat problem — the server is extremely likely to have just been doing ~45k/s queries throughout this whole time, but that is not what mongostat makes it look like.
The same can be seen for all the other differentiable metrics, such as insert, update, getmore, command, net_in, and net_out.
(h/t to dmitry.agranat for noticing these implausible numbers in the first place)