[SERVER-74359] Tune t-digest Created: 24/Feb/23 Updated: 29/Oct/23 Resolved: 13/Apr/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.0.0-rc0, 7.1.0-rc0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Irina Yatsenko (Inactive) | Assignee: | Irina Yatsenko (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Backport Requested: |
v7.0
|
||||
| Sprint: | QI 2023-04-17 | ||||
| Participants: | |||||
| Description |
|
T-digest's accuracy vs performance can be tuned using the compaction factor, size of the merge buffer and details of the scaling function. We are not planning to expose any of these parameters to the customers but instead will tune them to achieve a general "sweet" spot. However, we might in the end introduce dial knobs for this, if helpful for testing or no definitive sweet spot exists. Micro-benchmarks from the initial impl of t-digest (the non-expr tests using 1e6 inputs and expr tests using 100 inputs, both with normal distribution)
Note: $group with $avg and null group key on a collection with 1e7 small documents, takes ~4500 msec in SBE and ~7200 msec in classic. So even major differences in runtimes of t-digest itself are unlikely to affect the query latency in a significant way. However, for the expressions it might make sense not to use t-digest at all. |
| Comments |
| Comment by Githook User [ 13/Apr/23 ] |
|
Author: {'name': 'Irina Yatsenko', 'email': 'irina.yatsenko@mongodb.com', 'username': 'IrinaYatsenko'}Message: |
| Comment by Githook User [ 13/Apr/23 ] |
|
Author: {'name': 'Irina Yatsenko', 'email': 'irina.yatsenko@mongodb.com', 'username': 'IrinaYatsenko'}Message: |
| Comment by Irina Yatsenko (Inactive) [ 12/Apr/23 ] |
|
Testing with genny workloads against dd80db90c00 in master indicates that we could get about 3% improvement by switching the buffer size to be x3 rather than x5. This might not be much, but it also shows that we could use delta = 2000 instead of 1000 to get better accuracy without losing perf and only moderately increasing memory usage: by 1000 elements in the buffer and ~1200 centroids, brining the memory likely to be used by the accumulator to about 6000 + 5000 doubles => 11KB, which is still way below our usual memory limits for accumulators. |