[SERVER-6197] $avg is calculated using integer math for integers even though result value is of type double Created: 24/Jun/12  Updated: 28/Oct/15  Resolved: 27/Jul/12

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 2.2.0-rc1

Type: Improvement Priority: Major - P3
Reporter: Aaron Staple Assignee: Matt Dannenberg
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-6203 Aggregation operators should have wel... Backlog
is related to SERVER-6166 consider expanding to wider data type... Closed
Participants:

 Description   

Observed behavior: $avg on integers is calculated using integer addition and division, as a result of which the average of ( 1, 1, 1, ..., 1, 0 ) is 0 and is reported as the double value 0.0.
Expected behavior: $avg is computed and reported using double values.

Test:

c = db.c;
c.drop();
 
// Insert 1000 one values and 1 zero value.
for( i = 0; i < 1000; ++i ) {
    c.save( { a:NumberInt( 1 ) } );
}
c.save( { a:NumberInt( 0 ) } );
 
// The average is currently reported as 0.0.
printjson( c.aggregate( { $group:{ _id:0, avg:{ $avg:'$a' } } } ) );

There is also some seemingly arbitrary overflow behavior, for example two ints will overflow on a shard but not on a standalone mongod.



 Comments   
Comment by auto [ 27/Jul/12 ]

Author:

{u'date': u'2012-07-23T14:27:55-07:00', u'email': u'dannenberg.matt@gmail.com', u'name': u'Matt Dannenberg'}

Message: SERVER-6275 SERVER-6197 use double for $avg

also do not count non-numeric types in $avg
also part of SERVER-6166 up convert from int to long on $sum
Branch: master
https://github.com/mongodb/mongo/commit/186c5a411c3a1e616f7e876820a13af46d6faa4c

Comment by Aaron Staple [ 21/Jul/12 ]

Putting in rc1 and assigning to Mathias based on Mathias's comment in SERVER-6166.

Comment by Aaron Staple [ 25/Jun/12 ]

The reasoning for the described expected behavior is that it is a simple way to ensure consistency of aggregation pipeline behaviors. These (currently implemented) behaviors are:

  • $divide performs division by coercing its operands to doubles and returning a double result value (even if the operands are integers)
  • $avg performs division using the type of the operand (which may be an integer) but then always returns a double result value

If $avg were to coerce its operands to doubles as part of its division step, the implementation would be consistent with $divide and with $avg's own return data type.

I think avoiding implicit double conversions is a reasonable policy. But if we want to adopt that policy we should apply it across the board, including to $divide and anywhere else it may apply.

Maybe it's worth creating a separate ticket to decide about implicit double conversions and have this ticket depend on that one?

Comment by Andy Schwerin [ 25/Jun/12 ]

I object to the "expected" behavior. We should follow the behavior of C and Python, and not coerce to double unless (1) a double appears in the pipeline, or (2) an explicit coercion is specified by the user. Consider the case of average very large 64-bit integers. The "average" computed using doubles will be less precise than that computed using 64-bit integers.

Generated at Thu Feb 08 03:11:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.