[SERVER-54229] Resharding metrics output from the currentOp command should be reset when a new operation is started Created: 03/Feb/21 Updated: 29/Oct/23 Resolved: 23/Mar/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.0-rc0 |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Lamont Nelson | Assignee: | Kshitij Gupta |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | PM-234-M3, PM-234-T-autocommits | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Sharding 2021-03-08, Sharding 2021-03-22, Sharding 2021-04-05 | ||||||||
| Participants: | |||||||||
| Story Points: | 2 | ||||||||
| Description |
|
The following metrics should be reset when a new resharding operation is started:
|
| Comments |
| Comment by Githook User [ 23/Mar/21 ] |
|
Author: {'name': 'Kshitij Gupta', 'email': 'kshitij.gupta@mongodb.com', 'username': 'kshitijng'}Message:
|
| Comment by Lamont Nelson [ 16/Feb/21 ] |
|
Based on our conversation last week, I created |
| Comment by Bruce Lucas (Inactive) [ 08/Feb/21 ] |
|
Normally we take the derivative of cumulative counters for display and report in units like "document / s", "bytes / s", etc. This allows to easily see if an operation is active and to correlate its performance with other metrics like cpu, disk, WT operations, etc. when doing performance analysis. When a cumulative counter like that is reset it results in an artificial large negative spike that is misleading and interferes with things like taking averages. Understood about not having per-collection metrics (and that's good), but I don't see the connection between that and resetting the metrics. What are you interested in seeing that wouldn't be possible if the metrics aren't reset? The lastCommittedTransaction metric isn't an example of what I was asking about as it's not a cumulative metric. We actually recently eliminated it from FTDC ( I'd be reluctant to inflate FTDC with a "cumulative" and a "last op"version of the same counters. If the most recent occurrence truly is of some special interest, the log file might be a better place to get that information. Logging this information on completion of a resharding operation would ensure that the information is available for all operations, including the most recent as of any particular time. |
| Comment by Max Hirschhorn [ 03/Feb/21 ] |
|
Hi bruce.lucas, we're having resharding report metrics for the actively running operation in serverStatus as part of the "shardingStatistics.resharding" section (see
I believe there's some prior art with lastCommittedTransaction for not tracking global measurements in serverStatus and FTDC. Do you feel it would be more clear to split "shardingStatistics.resharding" into "shardingStatistics.resharding" and "shardingStatistics.resharding.lastOp" (or some similar name)? |
| Comment by Bruce Lucas (Inactive) [ 03/Feb/21 ] |
|
Generally serverStatus metrics that represent cumulative counters or cumulative elapsed time are expected by downstream tooling to be cumulative since the server started, so I'm not sure I would expect those to be reset. Are there any other instances of cumulative server counters that get reset, for comparison? |