[SERVER-33155] Export/report lock-held time statistics Created: 07/Feb/18  Updated: 21/Mar/18  Resolved: 21/Feb/18

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Bartley Assignee: Kelsey Schubert
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-29632 Expose time spent holding locks, excl... Backlog
is related to SERVER-33156 Report lock statistics per collection... Closed
Participants:

 Description   

serverStatus includes stats for lock acquisitions and deadlocks, but doesn't report lock held time. It would be good if it included these, since that information is useful for diagnosing problematic nodes. I can provide a patch that we've been running in production for several months if useful.



 Comments   
Comment by Kelsey Schubert [ 21/Feb/18 ]

Hi bartle

I'm closing this ticket in favor of SERVER-33156 as we have some concerns about including collection specific lock metrics in serverstatus, and including it in collstats would server your needs.

Kind regards,
Kelsey

Comment by David Bartley [ 09/Feb/18 ]

Since WiredTiger supports document-level locking, we typically find that lock held time is a pretty good proxy for operation time (lock acquiring time is usually negligible). If mongo wanted to support per-collection/per-db operation times, we'd collect those metrics, but I think we'd still opt to collect more detailed lock information as well, as it's always better to over-collect metrics

I think it'd be fine to report per-collection information via collStats and dbStats, though we've been fine with having that information reported via serverStatus (it means that our metrics collector only needs to issue a single command, vs one per collection, which tends to be fairly slow). Since serverStatus already supports a mechanism to limit section output, one could imagine adding an extendedLocks section, that would be disabled by default?

Comment by Bruce Lucas (Inactive) [ 08/Feb/18 ]

Hi David,

Currently we have global operation count and operation latency metrics which give some related information. I think your request differs from this in two ways:

  • You're asking for metrics regarding lock times. This is similar except 1) it excludes time spent queued, and 2) it can distinguish between different kinds of lock, and 3) a single operation can release and acquire locks multiple times. Would operation latency counts and metrics fill your needs?
  • You are asking for per-collection information. Generally per-collection information is something we wouldn't add to serverStatus as it could be quite large, but we could possibly add it to for example collection stats. Would this work for your use case?

Thanks,
Bruce

Comment by David Bartley [ 08/Feb/18 ]

We've only found them useful in conjunction with https://jira.mongodb.org/browse/SERVER-33156; with that, there's a few ways we've seen this be useful:
1) At a quick glance, it tells us which collections are "busiest", both in terms of reads and writes
2) When we see CPU spike on a node, we can almost always correlate that with an increase in read lock to a specific collection, which helps narrow our debugging efforts
3) When we do version upgrades, we use these as a coarse way of determining if some pathological queries got worse (e.g. we saw changes between versions around specific collections that were effectively being used to contain global counters, where we do lots of findAndModifies)

Comment by Kelsey Schubert [ 08/Feb/18 ]

Hi bartle,

Thanks for the feature request; we'd be interested in reviewing your patch - would you be willing to open a pull request?

Could you also speak to the types of issues you are using these the metrics to diagnose? I suspect that this information would be helpful to have, but additional context would help us as we consider how we can most effectively report these types of metrics.

Please note that for us to consider a pull request, we would need you to sign the contributor agreement.

Thanks again,
Kelsey

Generated at Thu Feb 08 04:32:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.