[SERVER-21654] aggregation pipeline not correctly on $stdDevPop and $stdDevSample not returning values correctly Created: 24/Nov/15  Updated: 24/Nov/15  Resolved: 24/Nov/15

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.2.0-rc2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Norberto Fernando Rocha Leite (Inactive) Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File ages.json    
Operating System: ALL
Steps To Reproduce:
  • import ages.json file
  • run the following instructions on shell

    var pipeline = [
      {$group: {_id:"$city", ages: {$push: "$age"}}},
      {$group: {_id:"$_id", average:{$avg: "$ages"}, stdSamp: {$stdDevSamp: "$ages"},  stdPop: {$stdDevPop: "$ages"}}},
    ];
    var res = db.ages.aggregate(pipeline);
    printjson(res);
    

Participants:

 Description   

Running an aggregation pipeline composed by the following $stdDevPop and $stdDevSamp does not produce the expected output.
If we have a collection where documents contain individual "age" field and we want to calculate the standard deviation ($stdDevPop) for a given selector (city) the pipeline should be the following:

db.ages.aggregate([ {$group: { ages: {$push:"$age"}, _id:"$city"  }}, {$group: { _id: null,  std: {$avg: "$ages"}  }}   ] )

but the output I'm currently getting is the following:

{
  "waitedMS": NumberLong("0"),
  "result": [
    {
      "_id": null,
      "std": null
    }
  ],
  "ok": 1
}

Running the same operation but using an intermidiate collection does perform the expected result:

db.ages.aggregate([ {$group: { ages: {$push:"$age"}, _id:"$city",  }}, {$out:'aaa'}] )
db.aaa.aggregate( {$project: {_id: "$_id", s: { $stdDevPop: "$ages"}}})
{
  "waitedMS": NumberLong("0"),
  "result": [
    {
      "_id": "NYC",
      "s": 29.037802755871343
    }
  ],
  "ok": 1
}

Also tested with $avg with the same issue.

Seems like there is some issue with pipelined $group operations



 Comments   
Comment by Kelsey Schubert [ 24/Nov/15 ]

Hi norberto.leite,

Thank you for the report. I can tell you that this is expected behavior and documented here. $stdDevSamp treats arrays differently in the $group stage compared to the $project stage.In the $group stage, if the expression resolves to an array, $stdDevSamp treats the operand as a non-numerical value and ignores it. In the $project stage, with a single expression as its operand, if the expression resolves to an array, $stdDevSamp traverses into the array to operate on the numerical elements of the array to return a single value.

If you change the second stage of your pipeline from $group to $project, you should see the results that you expect.

var pipeline = [
  {$group: {_id:"$city", ages: {$push: "$age"}}},
  {$project: {_id:"$_id", average:{$avg: "$ages"}, stdSamp: {$stdDevSamp: "$ages"},  stdPop: {$stdDevPop: "$ages"}}},
];
db.ages.aggregate(pipeline);

Regards,
Thomas

Generated at Thu Feb 08 03:58:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.