-
Type: Task
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Sharding NYC
-
Fully Compatible
-
Sharding NYC 2023-04-17, Sharding NYC 2023-05-01, Sharding NYC 2023-05-15, Sharding NYC 2023-05-29, Sharding NYC 2023-06-12, Sharding NYC 2023-06-26, Sharding NYC 2023-07-10
Four most-used shard key patterns: single_hashed, ranged_compound, single_ranged, id_hashed
Test plan
How long does analyzeShardKey take to run by itself
without samples
T1. Run ycsb_load for {5M, 50M, 1000M} documents, followed by analyzeShardKey for {field0: 1, field1: "hashed"}. This will tell us whether run-time of the command is linear w.r.t. number of documents.
T2. Repeat test T1 for {field0: 1}
T2a. Repeat test T1 for two analyzeShardKey commands running concurrently, one command for each key.
with samples
T3. For each of the clusters containing {50M} documents, run {30, 60} minutes of ycsb_read50update50 with sampling rate of 50/sec, followed by analyzeShardKey for {field0: 1, field1: "hashed"}. This will tell us how much longer the command takes to gather read/write distribution metrics.
T4. Repeat test T3 for {field0: 1}
T4a. Repeat test T3 for two analyzeShardKey commands running concurrently, one command for each key.
How does analyzeShardKey and query sampling impact concurrent workload performance
T5. Run ycsb_load for {1000M} documents, followed by a long run of ycsb_read50update50 with sampling rate of 50/sec. While ycsb_read50update50 is running, run analyzeShardKey every X minutes. (X depends on how long analyzeShardKey takes to run.)
Full test descriptions are here: https://docs.google.com/document/d/1FjAvT-XCASxseYEFos4CX57ZFI_vSrb7UgQtTVzLvgU/edit?usp=sharing