-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Execution
I'm filing this based on my understanding of a proposal from joe.sack. We currently count how many times each command runs in total, and of these runs how many fail. However, there is no indication in serverStatus of the cause of failure. There is interest from query product management around knowing what the most frequent errors are. For instance, this came up in the context of tracking how many find or aggregate commands fail because their memory budget was exhausted and spilling to disk was disabled. I could also imagine interest in tracking transient or retryable errors which can require client-side retries. Or I could imagine having a bug which causes correct queries to fail spuriously with some internal unnamed error code, and having this data would allow us to assess the prevalence of the issue in Atlas.
I can imagine two different ways of displaying this data. Option one would be to report counts of error codes across all commands:
errorCodes: { DuplicateKey: 123, Location4567800: 456, ...
Alternatively, we could take up more space and present a more granular view where this data is presented on a per-command basis:
MongoDB Enterprise > db.serverStatus().metrics.commands ... "find" : { "failed" : 7, "errorCodes": { "DuplicateKey": 3, "Location4567800": 4, } "total" : NumberLong(100) }, ...
- is related to
-
SERVER-67699 Add tracking for when change stream event exceeds 16Mb
- Closed
- related to
-
SERVER-73524 Report a histogram of error codes rather than just an error counter in serverStatus
- Open
-
SERVER-94409 Add ErrorCode counters to FTDC
- Open