-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Not Applicable
-
5
-
StorEng - Defined Pipeline
Summary
We want to be able to track actual S3 requests (size of requests, how many, is input/output, cached or not) and have them appear as WT statistics. Right now many of these things are being tracked and reported via a logging mechanism. We really want it tracked in a standard way that can be accessed directly via statistics cursors and pushed into time series files that can be examined by t2.
TBD on whether we want to make a general solution for extension statistics, or something more specific that models perhaps current file system statistics. Read/write to disk storage already has counter, size and latency stats, so maybe there's a way we can reorganize software (abstract out the "I/O stats collection"so it can be used by both. Note that the S3_store I/O calls are triggered indirectly by a WT_STORAGE_SOURCE->ss_flush call, it is not the individual WT_FILE->write calls that cause the S3 transfer to happen.
Motivation
- This came out of discussions with steve.kuhn@mongodb.com and keith.smith@mongodb.com. We want to make technical and product decisions based on real data. The data to be collected by microbenchmarks in PM-2524 touch the surface of what we need.
Acceptance Criteria (Definition of Done)
Ideally, we'd have the ability to look at number and size of S3 put/get calls along with latency and access this with statistics cursors and t2. We'll need some simply (probably python) tests to show this - at least the statistics cursors part.