[SERVER-74296] Consider using linear scaling function in t-digest when computing $median Created: 22/Feb/23  Updated: 01/Apr/23  Resolved: 01/Apr/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Irina Yatsenko (Inactive) Assignee: Irina Yatsenko (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Integration
Backwards Compatibility: Fully Compatible
Participants:

 Description   

The linear scaling function doesn't provide good enough accuracy for extreme percentiles. This task includes tuning the choice of the function to balance runtime/accuracy. There are three commonly used ones (based on sqrt, arcsin, log) but we also should look into optimizing for the median.



 Comments   
Comment by Irina Yatsenko (Inactive) [ 01/Apr/23 ]

Considered and not found benefitial.

Comment by Irina Yatsenko (Inactive) [ 01/Apr/23 ]

Investigated replacing k2 with k0 but not observing any considerable perf impact (https://docs.google.com/spreadsheets/d/1uKuiB2ZXon4R2i153Tlvu8jh7v7FOXePXh6PSo1TF8o/edit#gid=932660904).

The accuracy might be slightly better with k0 for non-extreme percentiles but we don't think it's worth pursuing.

Generated at Thu Feb 08 06:27:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.