[SERVER-29304] Exclude time spent blocking for awaitData from latency metrics Created: 19/May/17  Updated: 30/Oct/23  Resolved: 14/Jun/17

Status: Closed
Project: Core Server
Component/s: Diagnostics, Querying
Affects Version/s: None
Fix Version/s: 3.5.9

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-18637 Measure initial lock time in operatio... Closed
is duplicated by SERVER-29189 Top and op latency histogram counters... Closed
is duplicated by SERVER-29435 Latencies in Top exclude lock acquisi... Closed
is duplicated by SERVER-34998 Ignore oplog timeout in serverStatus ... Closed
Related
Backwards Compatibility: Fully Compatible
Backport Requested:
v3.4
Sprint: Query 2017-06-19
Participants:
Case:

 Description   

When a getMore is issued against a tailable cursor with the awaitData option and no results are currently available, the thread executing the getMore will wait on a condition variable for additional results to become available via insertion into the capped collection. This allows clients to periodically poll a capped collection for new records.

This waiting time is currently incorporated into some of the latency measurements surfaced by the server (e.g. in diagnostic logs, the system.profile collection, and the opLatencies section of serverStatus). This can be counter-intuitive to users, since reporting on average latency may be skewed by this "expected" waiting time on systems that are mostly idle. Therefore, we should exclude this "expected latency" from the following:

  • system.profile
  • slow query logs
  • top
  • global operation latency histogram
  • per-collection operation latency histogram

Note, however, that this behavior change will not apply to db.currentOp()'s secs_running or microsecs_running, since this output is more often used to gauge the true latency of active operations, rather than generating reports after the fact.



 Comments   
Comment by Githook User [ 14/Jun/17 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-29304 Exclude time spent blocking for awaitData from getMore latency reporting.

This change applies to various server diagnostics which
include latency metrics: the slow query logs, the
system.profile collection, top, global operation latency
stats, and per-collection operation latency stats. The
secs_running and microsecs_running fields reported in
currentOp output, on the other hand, will include time spent
blocking for awaitData.

Furthermore, this change fixes the following:

Generated at Thu Feb 08 04:20:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.