Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- storex-ranked

Assigned Teams:

Storage Execution
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We often run into issues with determining whether a cluster is sized appropriately for a time-series workload, which often comes down to the cardinality of the workload. Prospective customers sometimes struggle to identify their workload cardinality, and even sophisticated users sometimes have trouble identifying when something goes wrong and their model doesn't match reality.

Adding explicit tracking (even approximate) for the cardinality of the metaField would help both us and our customers to diagnose sizing and modeling issues much more quickly.

We would likely not need to persist these estimations anywhere, as tracking the working set (as opposed to the full historical collection data) is the main goal. Concise cardinality estimators like HyperLogLog and its variants could be used to do this efficiently.

We will likely want to track server-global cardinality at a minimum, as this is the most important for determining performance and sizing, and can offer limited insight into modeling issues. The global numbers can be reported via serverStatus and FTDC. If we determine that the memory and performance overhead of tracking this on a per-collection level is acceptable, then doing so and reporting via collStats should give finer-grained detail for workloads that utilize multiple collections.

Assignee:: Unassigned
Reporter:: Dan Larkin-York
Participants:: Dan Larkin-York
Votes:: 0 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: Dec 19 2024 04:51:22 PM UTC
Updated:: Jan 09 2025 09:53:01 PM UTC

Details

Description

Attachments

Forms

Activity

People

Dates