[SERVER-38805] Why there is no direct operator to find duplicates inside an array Created: 02/Jan/19  Updated: 07/Jan/19  Resolved: 02/Jan/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ashish Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

I have below users collectionI have below users collection

    [{    "_id": 1,    "adds": ["111", "222", "333", "111"]    }, {    "_id": 2,    "adds": ["555", "666", "777", "555"]    }, {    "_id": 3,    "adds": ["888", "999", "000", "888"]    }]

I need to find the duplicates inside the `adds` array
The expected output should be 

    [{    "_id": 1,    "adds": ["111"]    }, {    "_id": 2,    "adds": [ "555"]    }, {    "_id": 3,    "adds": ["888"]    }]

I have tried using many operators `$setUnion`, `$setDifference` but none of the did the trick. 



 Comments   
Comment by Asya Kamsky [ 07/Jan/19 ]

There are no such plans currently.

It's not clear to me what exactly it should do. Can you describe your use case in enough detail that I can understand what the array is tracking and why you would need to extract non-unique elements from it?

Comment by Ashish [ 06/Jan/19 ]

@Aaya Kamsky Ok so now is there any intent to invent any operator for this? 

Comment by Asya Kamsky [ 04/Jan/19 ]

Ashishlal95 usually it's because there either hasn't been any requests for it, but in other case it's when it's not obvious what the syntax for it should be. I'm not aware of any such syntax in languages I'm familiar with. It's usually handled with code, like here: https://stackoverflow.com/questions/9835762/how-do-i-find-the-duplicates-in-a-list-and-create-another-list-with-them

Comment by Ashish [ 03/Jan/19 ]

Daniel Hatcher & Asya Kamsky I have already posted it on (Stackoverflow)https://stackoverflow.com/questions/52839656/find-duplicate-inside-array-without-unwind and got the answer. But it is just annoying trick to find the duplicates. So my question is why there is no direct operator which can find the duplicates inside an array?

Comment by Asya Kamsky [ 02/Jan/19 ]

Ashishlal95 this can already be done with existing array expressions.

Using your example:

db.dups.aggregate({$addFields:{
    dups:{$filter:{
        input:{$setUnion:["$adds"]}, 
        as:"s", 
        cond:{$gt:[ {$size:{$filter:{input:"$adds",cond:{$eq:["$$this","$$s"]} }}}, 1]}
    }}
}})
{ "_id" : 1, "adds" : [ "111", "222", "333", "111" ], "dups" : [ "111" ] }
{ "_id" : 2, "adds" : [ "555", "666", "777", "555" ], "dups" : [ "555" ] }
{ "_id" : 3, "adds" : [ "888", "999", "000", "888" ], "dups" : [ "888" ] }

Comment by Danny Hatcher (Inactive) [ 02/Jan/19 ]

I apologize, I misread your question. $setUnion is used to compare multiple arrays; as you are attempting to find duplicates within an individual array it will not be sufficient in this case.

Comment by Danny Hatcher (Inactive) [ 02/Jan/19 ]

Hello Ashish,

Thanks for your report. Please note that SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group.

I believe that $setUnion should address your use case. If it does not, I recommend providing examples of what you have attempted when posting on Google Groups or Stack Overflow.

Thank you,

Danny

Generated at Thu Feb 08 04:50:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.