[SERVER-74359] Tune t-digest Created: 24/Feb/23  Updated: 29/Oct/23  Resolved: 13/Apr/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.0.0-rc0, 7.1.0-rc0

Type: Task Priority: Major - P3
Reporter: Irina Yatsenko (Inactive) Assignee: Irina Yatsenko (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Backport Requested:
v7.0
Sprint: QI 2023-04-17
Participants:

 Description   

T-digest's accuracy vs performance can be tuned using the compaction factor, size of the merge buffer and details of the scaling function. We are not planning to expose any of these parameters to the customers but instead will tune them to achieve a general "sweet" spot. However, we might in the end introduce dial knobs for this, if helpful for testing or no definitive sweet spot exists.

Micro-benchmarks from the initial impl of t-digest (the non-expr tests using 1e6 inputs and expr tests using 100 inputs, both with normal distribution)

 

------------------------------------------------------------------------------------------------------
Benchmark                                                            Time             CPU   Iterations
------------------------------------------------------------------------------------------------------
PercentileAlgoBenchmarkFixture/tdigest_k0_delta1000          627636909 ns    627617190 ns            1
PercentileAlgoBenchmarkFixture/tdigest_k1_delta1000          682104826 ns    682094139 ns            1
PercentileAlgoBenchmarkFixture/tdigest_k2_delta500           643034220 ns    643014710 ns            1
PercentileAlgoBenchmarkFixture/tdigest_k2_delta1000          646875381 ns    646855993 ns            1
PercentileAlgoBenchmarkFixture/tdigest_k2_delta5000          742438555 ns    742403578 ns            1
PercentileAlgoBenchmarkFixture/tdigest_k2_delta1000_sorted   167359114 ns    167354307 ns            4
PercentileAlgoBenchmarkFixture/tdigest_k2_delta1000_batched  638900042 ns    638880589 ns            1
PercentileAlgoBenchmarkFixture/tdigest_expr_99_100                1684 ns         1684 ns       414402
PercentileAlgoBenchmarkFixture/tdigest_expr_01_100                1507 ns         1507 ns       464590
PercentileAlgoBenchmarkFixture/tdigest_expr_01_1000              33418 ns        33416 ns        20924
PercentileAlgoBenchmarkFixture/sortAndRank_expr_100                632 ns          632 ns      1109422
PercentileAlgoBenchmarkFixture/sortAndRank_expr_1000             22419 ns        22418 ns        31093

 

Note: $group with $avg and null group key on a collection with 1e7 small documents, takes ~4500 msec in SBE and ~7200 msec in classic. So even major differences in runtimes of t-digest itself are unlikely to affect the query latency in a significant way. However, for the expressions it might make sense not to use t-digest at all.



 Comments   
Comment by Githook User [ 13/Apr/23 ]

Author:

{'name': 'Irina Yatsenko', 'email': 'irina.yatsenko@mongodb.com', 'username': 'IrinaYatsenko'}

Message: SERVER-74359 Tune t-digest settings
Branch: v7.0
https://github.com/mongodb/mongo/commit/b52563099a666bd3e2d3a2cc1b85de736779ce1b

Comment by Githook User [ 13/Apr/23 ]

Author:

{'name': 'Irina Yatsenko', 'email': 'irina.yatsenko@mongodb.com', 'username': 'IrinaYatsenko'}

Message: SERVER-74359 Tune t-digest settings
Branch: master
https://github.com/mongodb/mongo/commit/896542d40dae91395eb41d9926148898a8f201cc

Comment by Irina Yatsenko (Inactive) [ 12/Apr/23 ]

Testing with genny workloads against dd80db90c00 in master indicates that we could get about 3% improvement by switching the buffer size to be x3 rather than x5. This might not be much, but it also shows that we could use delta = 2000 instead of 1000 to get better accuracy without losing perf and only moderately increasing memory usage: by 1000 elements in the buffer and ~1200 centroids, brining the memory likely to be used by the accumulator to about 6000 + 5000 doubles => 11KB, which is still way below our usual memory limits for accumulators.

Generated at Thu Feb 08 06:27:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.