[SERVER-6197] $avg is calculated using integer math for integers even though result value is of type double Created: 24/Jun/12 Updated: 28/Oct/15 Resolved: 27/Jul/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | 2.2.0-rc1 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Aaron Staple | Assignee: | Matt Dannenberg |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
Observed behavior: $avg on integers is calculated using integer addition and division, as a result of which the average of ( 1, 1, 1, ..., 1, 0 ) is 0 and is reported as the double value 0.0. Test:
There is also some seemingly arbitrary overflow behavior, for example two ints will overflow on a shard but not on a standalone mongod. |
| Comments |
| Comment by auto [ 27/Jul/12 ] |
|
Author: {u'date': u'2012-07-23T14:27:55-07:00', u'email': u'dannenberg.matt@gmail.com', u'name': u'Matt Dannenberg'}Message: also do not count non-numeric types in $avg |
| Comment by Aaron Staple [ 21/Jul/12 ] |
|
Putting in rc1 and assigning to Mathias based on Mathias's comment in |
| Comment by Aaron Staple [ 25/Jun/12 ] |
|
The reasoning for the described expected behavior is that it is a simple way to ensure consistency of aggregation pipeline behaviors. These (currently implemented) behaviors are:
If $avg were to coerce its operands to doubles as part of its division step, the implementation would be consistent with $divide and with $avg's own return data type. I think avoiding implicit double conversions is a reasonable policy. But if we want to adopt that policy we should apply it across the board, including to $divide and anywhere else it may apply. Maybe it's worth creating a separate ticket to decide about implicit double conversions and have this ticket depend on that one? |
| Comment by Andy Schwerin [ 25/Jun/12 ] |
|
I object to the "expected" behavior. We should follow the behavior of C and Python, and not coerce to double unless (1) a double appears in the pipeline, or (2) an explicit coercion is specified by the user. Consider the case of average very large 64-bit integers. The "average" computed using doubles will be less precise than that computed using 64-bit integers. |