[SERVER-5905] Add data collection and command to get histogram of query response times Created: 23/May/12  Updated: 22/Mar/17  Resolved: 24/Jun/16

Status: Closed
Project: Core Server
Component/s: Diagnostics, Performance
Affects Version/s: None
Fix Version/s: 3.3.9

Type: New Feature Priority: Major - P3
Reporter: Tad Marshall Assignee: Kevin Albertson
Resolution: Done Votes: 5
Labels: neweng, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
is documented by DOCS-8914 Document the $collStats aggregation s... Closed
is documented by DOCS-8165 Document the operation latency histogram Closed
Duplicate
is duplicated by SERVER-7774 Add jstest for db.adminCommand('top') Closed
Related
related to SERVER-5828 Metric/Stats Tracking Closed
Backwards Compatibility: Fully Compatible
Sprint: Integrate+Tuning 16 (06/24/16), Integration 17 (07/15/16)
Participants:

 Description   

We should have instrumentation to characterize the overall workload and response time of a mongod or mongos server. A histogram with buckets with log base 2 microsecond resolution would be a nice start. Here's a straw man proposal:

1) For every request from a client, log the time it was received using the least expensive high resolution method. On Windows, this would be QueryPerformanceCounter().
2) When the response is complete, compute the elapsed time in microseconds. On Windows, this would be another call to QueryPerformanceCounter() and division by a precomputed conversion factor.
3) Add 1 to the bucket associated with this time interval. Bucket 0 gets all times below 1 microsecond, bucket 1 gets times above 1 microsecond but below 2 microseconds, bucket 2 gets times from 2 to 4 microseconds, then 4 to 8, etc. 31 buckets would cover times up to 2147 seconds and anything taking longer than 2147 seconds would go in the last bucket, so 32 buckets would cover the time periods we are most interested in.
4) Every 10 seconds, add the histogram to a "since started" histogram, write it to a capped collection sized for one week of data, save a snapshot copy and then zero it.
5) Provide $cmd commands to fetch the most recent snapshot and the "since started" histogram.
6) Give MMS the ability to show the most recent snapshot and the "since started" snapshot.
7) For extra credit, MMS could show a contour plot or some other 3D display of response time history, showing the changing shape of the curve.

Once the baseline functionality is working, we could consider doing this by database, by collection, by request type or by some other criterion. These would be additional instances of the same feature.

There are a lot of things that we could learn by having this information:
1) If a query was slow at one time but not at another, was there a difference in the number of requests it was competing with in the two cases?
2) Is a workload doing mostly very fast stuff with a little slow stuff, or is everything slow?
3) Does a change to something in the system change the mix of response times?
4) Do response times follow a recognizable pattern, like a bell curve with a visible center, or a skew towards fast responses, or a curve with multiple peaks?
5) Is anything really fast, or is the minimum response time in the millisecond and above range?
6) Do we have periods with little visible activity followed by periods when many slow requests complete?
7) Does the addition of a new application, or a new shard, or a new mongos change the response time pattern?

The better we can characterize workloads and our response to them, the better we can diagnose problems and propose solutions. All to the good.



 Comments   
Comment by Githook User [ 24/Jun/16 ]

Author:

{u'username': u'kevinAlbs', u'name': u'Kevin Albertson', u'email': u'kevin.albertson@10gen.com'}

Message: SERVER-5905 Add collStats aggregation stage
Branch: master
https://github.com/mongodb/mongo/commit/6f0af04446b6dcd682ca844757f023f7f5c900cc

Comment by Githook User [ 24/Jun/16 ]

Author:

{u'username': u'kevinAlbs', u'name': u'Kevin Albertson', u'email': u'kevin.albertson@10gen.com'}

Message: SERVER-5905 Add operation latency histogram
Branch: master
https://github.com/mongodb/mongo/commit/6c755905c31ac284d88077500ebba021d20b3626

Comment by Githook User [ 24/Jun/16 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: Revert "SERVER-5905 Add operation latency histogram"

This reverts commit c7794350b056cdea85e1c6185a7dda4579936179.
Branch: master
https://github.com/mongodb/mongo/commit/7084de1f754ffaf94ead1a5e1bd8475e3a115b76

Comment by Githook User [ 24/Jun/16 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: Revert "SERVER-5905 Add collStats aggregation stage"

This reverts commit a22d2843ccab7f0333434d1124358c5c182427f6.
Branch: master
https://github.com/mongodb/mongo/commit/a5e8fb2029c4d96daa0d07f287bf07a68a3ec1fa

Comment by Kevin Albertson [ 23/Jun/16 ]

We need to add a documentation page describing the operation latency histogram and its usage.

Comment by Githook User [ 23/Jun/16 ]

Author:

{u'username': u'kevinAlbs', u'name': u'Kevin Albertson', u'email': u'kevin.albertson@10gen.com'}

Message: SERVER-5905 Add collStats aggregation stage

Signed-off-by: Kyle Suarez <kyle.suarez@mongodb.com>
Branch: master
https://github.com/mongodb/mongo/commit/a22d2843ccab7f0333434d1124358c5c182427f6

Comment by Githook User [ 23/Jun/16 ]

Author:

{u'username': u'kevinAlbs', u'name': u'Kevin Albertson', u'email': u'kevin.albertson@10gen.com'}

Message: SERVER-5905 Add operation latency histogram

Signed-off-by: Kyle Suarez <kyle.suarez@mongodb.com>
Branch: master
https://github.com/mongodb/mongo/commit/c7794350b056cdea85e1c6185a7dda4579936179

Generated at Thu Feb 08 03:10:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.