[SERVER-38068] Track successful command completion in FTDC Created: 09/Nov/18  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Diagnostics
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Benjamin Caimano (Inactive) Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 0
Labels: sa-remove-fv-backlog-22
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-38104 serverStatus metrics for currently in... Closed
Related
Assigned Teams:
Service Arch
Participants:

 Description   

Currently, we track two buckets for command execution:

  • "total" - the beginning of a command execution
  • "failed" - the completion with error of a command execution

I believe there should be a third metric bucket:

  • "ok" - the completion without error of a command execution


 Comments   
Comment by Bruce Lucas (Inactive) [ 15/Nov/18 ]

total - failed counts commands that are in progress or have completed successfully. The "ok" metric would allow us to separate in progress from successfully completed. Also, for infrequently executed long-running commands (which in fact tend to be the troublemakers) the "total" metric and the "ok" metric will mark the start and the end of the command, which can be useful for diagnosis.

Comment by Danny Hatcher (Inactive) [ 15/Nov/18 ]

Speaking to the "ok" metric, what is the benefit this provides? Isn't "ok" simply "total" - "failed? Are there situations in which a quick glance at those two existing metrics were not sufficient to diagnose an issue?

Comment by Kevin Pulo [ 15/Nov/18 ]

Might also be good if we could also do this for opcounters (though perhaps not success/fail), eg. a "retired" metric for each opcounter, which gets incremented when each op ceases consuming resources (eg. returns results to the client, is killed, ...).

Comment by Bruce Lucas (Inactive) [ 12/Nov/18 ]

I like this idea, but we should think about the impact on ftdc - it will add about 200 metrics by my count. Most of those will not change so delta will be 0 most of the time, so maybe ok in terms of data size, but it will have some impact on the downstream consumers of ftdc. Worth some thought.

Comment by Benjamin Caimano (Inactive) [ 09/Nov/18 ]

I also believe that "total" is not a great name to contrast failed, but that may need to stick around for compatibility reasons.

Generated at Thu Feb 08 04:47:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.