[SERVER-74296] Consider using linear scaling function in t-digest when computing $median Created: 22/Feb/23 Updated: 01/Apr/23 Resolved: 01/Apr/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Irina Yatsenko (Inactive) | Assignee: | Irina Yatsenko (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Integration
|
| Backwards Compatibility: | Fully Compatible |
| Participants: |
| Description |
|
The linear scaling function doesn't provide good enough accuracy for extreme percentiles. This task includes tuning the choice of the function to balance runtime/accuracy. There are three commonly used ones (based on sqrt, arcsin, log) but we also should look into optimizing for the median. |
| Comments |
| Comment by Irina Yatsenko (Inactive) [ 01/Apr/23 ] |
|
Considered and not found benefitial. |
| Comment by Irina Yatsenko (Inactive) [ 01/Apr/23 ] |
|
Investigated replacing k2 with k0 but not observing any considerable perf impact (https://docs.google.com/spreadsheets/d/1uKuiB2ZXon4R2i153Tlvu8jh7v7FOXePXh6PSo1TF8o/edit#gid=932660904). The accuracy might be slightly better with k0 for non-extreme percentiles but we don't think it's worth pursuing. |