[SERVER-27472] Double Array with NaN values - $sum , $avg,$min returns NaN results but not $max Created: 20/Dec/16 Updated: 27/Oct/23 Resolved: 20/Dec/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kowshik [X] | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Works as Designed | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Mongo 3.2 and Mongo 3.4 |
||
| Issue Links: |
|
||||
| Assigned Teams: |
Query
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Participants: | |||||
| Description |
|
I have document in the similar structure to hold a 5 minutely values in a document. Document structure:
I am doing an hourly aggregation to find min , max, average, sum per hour . One of the values is NaN. Aggregation returns the following results Aggregation Query :
Aggregation results: Sum: NaN Question is :
Should All operators should not behave in a similar way.? |
| Comments |
| Comment by Kowshik [X] [ 20/Dec/16 ] | |||||||||||||||||||||||||||||
|
Thanks Charlie and David. Yes, I tried two options 1. filter, which is same as the above $cond is very useful too. Good to know. Thank you for the quick ,prompt and descriptive responses . Really helpful! "values" : { "0" : [null, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0], "1" : [13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0], }Best Regards | |||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 20/Dec/16 ] | |||||||||||||||||||||||||||||
|
david.storch pointed out that this might be more clearly achieved by using $filter:
| |||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 20/Dec/16 ] | |||||||||||||||||||||||||||||
|
Oh, good point. I believe you could work around this issue by using $cond:
This is also weird and seems to contradict my argument above, but we also have a precedent of ignoring 'nullish' values (missing, null or undefind) within aggregation accumulators such as $avg, $min and $max. | |||||||||||||||||||||||||||||
| Comment by Kowshik [X] [ 20/Dec/16 ] | |||||||||||||||||||||||||||||
|
Hi Charlie, I get your explanation. But when we do aggregations we want to ensure excel type behavior in our case. Is there a way we can ensure this ? Probably an option about how to treat NaN's ? Input to Excel : NaN, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0 *Output * Sum 77 Regards Kowshik | |||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 20/Dec/16 ] | |||||||||||||||||||||||||||||
|
Hmm, it looks like someone has given some thought to this case at some point. For $min and $max, we eventually end up calling compareDoubles() with two values, one of which is NaN. This has a special case for NaN, which decides that NaN should be treated as smaller than all other doubles. Kowshik, I see your point that you would expect all of these operations to return the same thing, but I think it's a little more subtle than that. I think that all arithmetic involving NaN should absolutely return NaN, since there's no clear answer otherwise. $min and $max are different though, since these operators must impose a total order on all possible values, including NaN and even other types:
We have a similar problem when we need to insert a NaN value into an index, and when we need to order results as part of a $sort stage or during a sort for a query. The index needs to have an entry for the NaN value, and it does need to go somewhere, so we need to give it a position in the index. For sorts, a NaN can absolutely be the result of a query, and will need to be ordered somehow before being returned to the client, so again we must decide where to order it:
I will add that there is a subtler point that is worth noting. Our query system has some special "type bracketing" which prevents a query like {a: {$gt: 4}} from returning results like {a: "string"} which are technically greater than 4, but probably not what you were looking for. In the same sense, the value NaN (literally, not a number), will not be considered as part of the numeric type bracket, which gives this behavior:
I understand that this is very confusing, and I'm happy to answer any further questions, but I'm closing this ticket as "Works as Designed" since we do not plan on changing this behavior. |