-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Minor - P4
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Integration
-
ALL
-
None
-
None
-
None
-
None
-
None
-
None
-
None
The $percentile and $median accumulators currently can be calculated by three methods:
- "approximate": Uses the t-digest algorithm to calculate approximate percentiles without having to keep all of the data in memory.
- "discrete": An exact computation of the percentile where one of the values in the input data set is chosen as the percentile value based on its rank.
- "continuous": An exact computation of the percentile using linear interpolation between values in the input
Currently any use of $percentile or $median as either an accumulator, expression, or window function requires the user to specify method: "approximate". For the percentile accumulators, this makes sense because it will always use t-digest; the exact percentile methods are disabled by default under featureFlagAccuratePercentiles. (Separately, we should consider prioritizing finishing the accurate percentiles feature and turning on the feature flag. My understanding from speaking with natalie.hill@mongodb.com is that it is almost complete.)
When $percentile or $median are used as an expression, we actually never use t-digest. The implementation exposed to end-users always ends up using the "discrete" method. The behavior that seems wrong to me is that we require users to type method: "approximate" and reject method: "discrete", but in reality method: "approximate" and method: "discrete" will both use an accurate discrete algorithm under the hood. My recommendation would be to permit "discrete" in addition to "approximate" when the percentile accumulators are used as expressions.
The situation is similar for window functions. The percentile window functions need to keep all of the data for the window in memory so that values exiting the window can be removed as the window advances. Since all of the data for the window is kept in memory, there is no need for t-digest. Again, when method: "approximate" is specified, the system actually uses "discrete" under the hood. Again, it seems we should permit both mode: "approximate" and mode: "discrete" and make them behave identically for the percentile window functions.
- related to
-
SERVER-91956 Improve precision for accurate percentiles
-
- Backlog
-
-
SERVER-52245 Enable feature flag for Discrete and Continuous Percentile and Median Accumulators
- Backlog
-
SERVER-93151 Enable shard pushdown of sorting required fields in accurate $percentile
-
- Backlog
-