Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:

Assigned Teams:

Query Execution
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

I'm filing this based on my understanding of a proposal from joe.sack. We currently count how many times each command runs in total, and of these runs how many fail. However, there is no indication in serverStatus of the cause of failure. There is interest from query product management around knowing what the most frequent errors are. For instance, this came up in the context of tracking how many find or aggregate commands fail because their memory budget was exhausted and spilling to disk was disabled. I could also imagine interest in tracking transient or retryable errors which can require client-side retries. Or I could imagine having a bug which causes correct queries to fail spuriously with some internal unnamed error code, and having this data would allow us to assess the prevalence of the issue in Atlas.

I can imagine two different ways of displaying this data. Option one would be to report counts of error codes across all commands:

errorCodes: {
    DuplicateKey: 123,
    Location4567800: 456,
   ...

Alternatively, we could take up more space and present a more granular view where this data is presented on a per-command basis:

MongoDB Enterprise > db.serverStatus().metrics.commands
...
	"find" : {
		"failed" : 7,
                 "errorCodes": {
                      "DuplicateKey": 3,
                      "Location4567800": 4,
                 }
		"total" : NumberLong(100)
	},
...

is related to

SERVER-67699 Add tracking for when change stream event exceeds 16Mb

Closed

related to

SERVER-73524 Report a histogram of error codes rather than just an error counter in serverStatus

Open

SERVER-94409 Add ErrorCode counters to FTDC

Open

Assignee:: [DO NOT USE] Backlog - Query Execution
Reporter:: David Storch
Participants:: [DO NOT USE] Backlog - Query Execution, Bruce Lucas, David Storch, Joe Sack
Votes:: 0 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: Mar 30 2022 08:31:29 PM UTC
Updated:: Sep 03 2024 06:13:36 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates