[SERVER-29621] $group by Objects (dictionary) is not reliable Created: 14/Jun/17  Updated: 29/Jul/17  Resolved: 15/Jun/17

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.2.7, 3.4.5
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Jonathan Huot [X] Assignee: Mark Agarunov
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

Create a test collection :

/* 1 */
{
    "_id" : ObjectId("594109ada25deacc151b3904"),
    "backends" : [ 
        {
            "port" : 7080,
            "host" : "127.0.0.1"
        }, 
        {
            "host" : "127.0.0.1",
            "port" : 7080
        }, 
        {
            "port" : 7080,
            "host" : "127.0.0.1"
        }
    ]
}

Execute the aggregate as below :

db.getCollection('test').aggregate([
{'$unwind': '$backends'},
{'$group': {'_id': '$backends'}}
])

Results, two documents:

/* 1 */
{
    "_id" : {
        "host" : "127.0.0.1",
        "port" : 7080
    }
}
 
/* 2 */
{
    "_id" : {
        "port" : 7080,
        "host" : "127.0.0.1"
    }
}

Expected results, one document :

/* 1 */
{
    "_id" : {
        "host" : "127.0.0.1",
        "port" : 7080
    }
}

Participants:

 Description   

Hi,
Until now, I was doing aggregate and was implementing sort-of DISTINCT by using $group on several documents. It works fine until one of the document got rewritten with its field in a different order.
Consequence of it, with $foo as an object, you can't assume

$group: {'_id': '$foo'}

is reliable.
If it's the normal behavior, could we mention this in the docs?
Thanks in advance



 Comments   
Comment by Asya Kamsky [ 24/Jun/17 ]

Please see this docs page which states: "If the specified <value> is a document, the order of the fields in the document matters"

Comment by Mark Agarunov [ 15/Jun/17 ]

Hello JonathanHuot,

Thank you for the report. The reason you're seeing this behavior is that the $group takes the order of the fields into account when grouping. To ensure that the fields are always in the same order before the $group stage, you could use $project. For example:

> db.test.aggregate([ 
      {'$unwind': '$backends'},
      {'$project':
         {"backends.port": "$backends.port", 
          "backends.host": "$backends.host"}
      },   
      {'$group': {'_id': '$backends'}}
])
{ "_id" : { "port" : 7080, "host" : "127.0.0.1" } }

This returns a single document as expected.

Please note that SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group.

Thanks,
Mark

Generated at Thu Feb 08 04:21:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.