Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Not Applicable
Labels:

Sprint:
StorEng - Defined Pipeline
Story Points:
5

Summary
We want to be able to track actual S3 requests (size of requests, how many, is input/output, cached or not) and have them appear as WT statistics. Right now many of these things are being tracked and reported via a logging mechanism. We really want it tracked in a standard way that can be accessed directly via statistics cursors and pushed into time series files that can be examined by t2.

TBD on whether we want to make a general solution for extension statistics, or something more specific that models perhaps current file system statistics. Read/write to disk storage already has counter, size and latency stats, so maybe there's a way we can reorganize software (abstract out the "I/O stats collection"so it can be used by both. Note that the S3_store I/O calls are triggered indirectly by a WT_STORAGE_SOURCE->ss_flush call, it is not the individual WT_FILE->write calls that cause the S3 transfer to happen.

Motivation

This came out of discussions with steve.kuhn@mongodb.com and keith.smith@mongodb.com. We want to make technical and product decisions based on real data. The data to be collected by microbenchmarks in PM-2524 touch the surface of what we need.

Acceptance Criteria (Definition of Done)
Ideally, we'd have the ability to look at number and size of S3 put/get calls along with latency and access this with statistics cursors and t2. We'll need some simply (probably python) tests to show this - at least the statistics cursors part.

Assignee:: [DO NOT USE] Backlog - Storage Engines Team
Reporter:: Donald Anderson
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Jul 28 2022 07:18:40 PM UTC
Updated:: Oct 10 2024 11:29:40 AM UTC

Details

Description

Attachments

Forms

Activity

People

Dates