[SERVER-27472] Double Array with NaN values - $sum , $avg,$min returns NaN results but not $max Created: 20/Dec/16  Updated: 27/Oct/23  Resolved: 20/Dec/16

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kowshik [X] Assignee: Backlog - Query Team (Inactive)
Resolution: Works as Designed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Mongo 3.2 and Mongo 3.4


Issue Links:
Duplicate
Assigned Teams:
Query
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

I have document in the similar structure to hold a 5 minutely values in a document.

Document structure:

{ 
    "_id" : ObjectId("5857f58d396adf19c41a3f14"), 
    "source" : "LONDON.CITY.ALL", 
    ""uIndex" : ISODate("2016-01-01T00:00:00.000+0000"), 
    "values" : {
        "0" : [NaN, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0], 
        "1" : [13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0],
		.....
		....
	"23" : [277.0, 278.0, 279.0, 280.0, 281.0, 282.0, 283.0, 284.0, 285.0, 286.0, 287.0, 288.0      ]
      }
}

I am doing an hourly aggregation to find min , max, average, sum per hour . One of the values is NaN.

Aggregation returns the following results

Aggregation Query :

db.City.aggregate([
                                                     {
                                                       $match:{"source":LONDON.CITY.ALL"}
                                                     },
                                                     
                                                    
                                                    {$unwind:"$values.0"}, 
                                          
                                                    {$group : { _id :"$source", sum:{$sum:"$values.0"},avg:{$avg:"$values.0"},max:{$max:"$values.0"},min:{$min:"$values.0"}}}
                                                    
                                                                                     
                                                ])

Aggregation results:

Sum: NaN
Avg:NaN
Max: 12
Min:Nan

Question is :

  1. Why is Sum , average ,Min operators returning Nan but Max is returning the maximum of that array "0" : [NaN, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0] ?
  2. How do we handle NaN in case of a double array

Should All operators should not behave in a similar way.?



 Comments   
Comment by Kowshik [X] [ 20/Dec/16 ]

Thanks Charlie and David.

Yes, I tried two options

1. filter, which is same as the above
2 . Instead of using storing NaN, I tried storing Null and get the desired results

$cond is very useful too. Good to know.

Thank you for the quick ,prompt and descriptive responses . Really helpful!

"values" :

{ "0" : [null, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0], "1" : [13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0], }

Best Regards
Kowshik NS

Comment by Charlie Swanson [ 20/Dec/16 ]

david.storch pointed out that this might be more clearly achieved by using $filter:

> db.City.aggregate([
    {$match: {"source": "LONDON.CITY.ALL"}},
    {$project: {source: 1, values: {$filter: {input: "$values.0", cond: {$ne: ["$$this", NaN]}}}}},
    {$unwind: "$values"},
    {
      $group: {
          _id: "$source",
          sum: {$sum: "$values"},
          avg: {$avg: "$values"},
          max: {$max: "$values"},
          min: {$min: "$values"}
      }
    }
]);
{ "_id": "LONDON.CITY.ALL", "sum": 77, "avg": 7, "max": 12, "min": 2 }

Comment by Charlie Swanson [ 20/Dec/16 ]

Oh, good point. I believe you could work around this issue by using $cond:

> db.City.insert({
    _id: ObjectId("5857f58d396adf19c41a3f14"),
    "source": "LONDON.CITY.ALL",
    "uIndex": ISODate("2016-01-01T00:00:00.000+0000"),
    "values": {
        "0": [NaN, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0],
    }
})
WriteResult({"nInserted": 1});
> db.City.aggregate([
    {$match: {"source": "LONDON.CITY.ALL"}},
    {$unwind: "$values.0"},
    {
      $project: {
          source: 1,
          value: {$cond: {if: {$eq: ["$values.0", NaN]}, then: null, else: "$values.0"}}
      }
    },
    {
      $group: {
          _id: "$source",
          sum: {$sum: "$value"},
          avg: {$avg: "$value"},
          max: {$max: "$value"},
          min: {$min: "$value"}
      }
    }
])
{ "_id": "LONDON.CITY.ALL", "sum": 77, "avg": 7, "max": 12, "min": 2 }

This is also weird and seems to contradict my argument above, but we also have a precedent of ignoring 'nullish' values (missing, null or undefind) within aggregation accumulators such as $avg, $min and $max.

Comment by Kowshik [X] [ 20/Dec/16 ]

Hi Charlie,

I get your explanation.

But when we do aggregations we want to ensure excel type behavior in our case.

Is there a way we can ensure this ? Probably an option about how to treat NaN's ?

Input to Excel : NaN, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0

*Output *

Sum 77
Average 7
max 12
Min 2

Regards

Kowshik

Comment by Charlie Swanson [ 20/Dec/16 ]

Hmm, it looks like someone has given some thought to this case at some point. For $min and $max, we eventually end up calling compareDoubles() with two values, one of which is NaN. This has a special case for NaN, which decides that NaN should be treated as smaller than all other doubles.

Kowshik, I see your point that you would expect all of these operations to return the same thing, but I think it's a little more subtle than that. I think that all arithmetic involving NaN should absolutely return NaN, since there's no clear answer otherwise. $min and $max are different though, since these operators must impose a total order on all possible values, including NaN and even other types:

> db.foo.aggregate([{$project: {min: {$min: ["hello", NaN, 4, NumberInt(3)]}}}])
{ "_id" : 0, "min" : NaN }
> db.foo.aggregate([{$project: {max: {$max: ["hello", NaN, 4, NumberInt(3)]}}}])
{ "_id" : 0, "max" : "hello" }

We have a similar problem when we need to insert a NaN value into an index, and when we need to order results as part of a $sort stage or during a sort for a query. The index needs to have an entry for the NaN value, and it does need to go somewhere, so we need to give it a position in the index. For sorts, a NaN can absolutely be the result of a query, and will need to be ordered somehow before being returned to the client, so again we must decide where to order it:

> db.foo.insert({x: 1})
WriteResult({ "nInserted" : 1 })
> db.foo.insert({x: NaN})
WriteResult({ "nInserted" : 1 })
> db.foo.find().sort({x: 1})
{ "_id" : ObjectId("58596b2d776cf164dbda9130"), "x" : NaN }
{ "_id" : ObjectId("58596b28776cf164dbda912f"), "x" : 1 }

I will add that there is a subtler point that is worth noting. Our query system has some special "type bracketing" which prevents a query like {a: {$gt: 4}} from returning results like {a: "string"} which are technically greater than 4, but probably not what you were looking for. In the same sense, the value NaN (literally, not a number), will not be considered as part of the numeric type bracket, which gives this behavior:

> db.foo.find().sort({x: 1})
{ "_id" : ObjectId("58596b2d776cf164dbda9130"), "x" : NaN }
{ "_id" : ObjectId("58596b28776cf164dbda912f"), "x" : 1 }
> db.foo.find({x: {$lt: 1}})
// No results.
> db.foo.find({x: {$gt: 1}})
// No results.

I understand that this is very confusing, and I'm happy to answer any further questions, but I'm closing this ticket as "Works as Designed" since we do not plan on changing this behavior.

Generated at Thu Feb 08 04:15:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.