Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- storex-shortlist

Assigned Teams:

Storage Execution
Sprint:
Storage Execution 2025-07-21, Storage Execution 2025-08-18
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We sometimes find that we could benefit from additional metrics associated with the time-series write path when attempting to diagnose unexpected behavior with certain workloads. The following have been identified as likely useful to have:

~~Per-stripe gauges of the current open, archived, and idle buckets.~~ Split out to SERVER-105439
Finer granularity counters for the reasons why a bucket reopening failed, e.g. due to era mismatch, hash collision, or malformed bucket.
Anywhere we have a retry loop, one counter that ticks on each execution of the loop, as well as one counter that only ticks on the first execution (i.e. to help us understand the average number of retries). Bucket reopening and _id generation are key examples.
A counter for the number of times we remove a cleared bucket from the catalog.
"Direct write" counters, particularly the number of bucket-level operations (insert, update, delete) due to both direct writes to system.buckets, as well as measurement-level updates and deletes. Separate metrics as much as as is reasonable for maximum visibility. Done in ~~SERVER-101263~~
~~A gauge for the current era span (the difference between the oldest and newest era with tracked buckets in the state registry).~~ Sufficient enough in ~~SERVER-73363~~

is related to

SERVER-73363 De-encapsulate bucket_catalog::BucketStateManager

Closed

SERVER-101263 Add more metrics for time-series updates and deletes

Closed

SERVER-102677 Add finer granularity counters for why reopening a time-series bucket failed

Closed

related to

SERVER-105439 Add per-stripe time-series metrics

Backlog

Assignee:: Matt Kneiser
Reporter:: Dan Larkin-York
Participants:: Dan Larkin-York, Matt Kneiser
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Dec 19 2024 04:43:18 PM UTC
Updated:: Jul 23 2025 06:33:18 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates