[SERVER-35481] _id field in unique index will return all results with a distinct ObjectId Created: 07/Jun/18  Updated: 27/Oct/23  Resolved: 11/Jun/18

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 3.2.11
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Romain CAMPIGOTTO Assignee: Nick Brewer
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

I design an academic distributed application in which I have a program who streams and collects tweets (via Twitter Streaming API), in particular profiles (authors informations) on a dedicated collection in my MongoDB database.

In this collection, I have a unique index applied on 2 fields.

My distributed application works with Apache Camel framework and with RabbitMQ server. When I set a number of consumers > 1 behind my streamer, I get duplicates in my collection. More precisely, for each duplicate entry, I have an incomplete entry (with numerous missing fields) and a complete entry.

If I drop and try to re-create unique index, I get an error, saying duplicates are present in collection.

I think it is a concurrent access problem, since collected date of each duplicate entries are very close.

I give below an example of duplicates, with current applied indexes on the collection:

> db.profiles.find({"account_id" : "761902690985127936"})
{ "_id" : ObjectId("5b14e02f9ae95e0e31793a93"), "broadcaster" : "Twitter", "account_type" : "account", "account_id" : "761902690985127936", "collected_date" : ISODate("2018-06-04T06:46:07.361Z") }
{ "_id" : ObjectId("5b14e02f9ae95e0e31793a94"), "broadcaster" : "Twitter", "account_type" : "account", "account_id" : "761902690985127936", "collected_date" : ISODate("2018-06-04T06:46:07.361Z"), "user_id" : "761902690985127936", "lang" : "eng", "location" : "Kuala Lumpur City", "user_name" : "Kakajan Haytlyyev #FBRParty", "user_account" : "kkjn1966", "utc_offset" : "GMT+0:00", "profile_link" : "https://twitter.com/kkjn1966", "account_created_at" : NumberLong("1470486732000"), "description" : "SAY WHAT YOU MEAN AND MEAN WHAT YOU SAY.   \nPolitical Correctness Is Not Allowed\n#Resistance #ResistanceUnited #StrongerTogether", "is_verified" : false, "geo_enabled" : false, "profile_image_url" : "http://pbs.twimg.com/profile_images/992869258337042432/xYbRXlgO.jpg", "profile_background_image_url" : "http://abs.twimg.com/images/themes/theme1/bg.png", "followers_count" : 2809, "friends_count" : 2752, "listed_count" : 3, "statuses_count" : 19472, "is_contributor_enabled" : false, "is_translator" : false, "is_protected" : false }
> db.profiles
db.profiles
> db.profiles.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "documents.profiles"
        },
        {
                "v" : 1,
                "unique" : true,
                "key" : {
                        "broadcaster" : 1,
                        "account_id" : 1,
                        "_id" : -1
                },
                "name" : "app_key",
                "ns" : "documents.profiles"
        },
        {
                "v" : 1,
                "key" : {
                        "broadcaster" : 1,
                        "user_account" : 1,
                        "_id" : -1
                },
                "name" : "broadcaster_1_user_account_1__id_-1",
                "ns" : "documents.profiles"
        }
]
>



 Comments   
Comment by Nick Brewer [ 11/Jun/18 ]

Hi Romain,

The 1 and -1 values are used to specify whether an ascending or descending index sort is used, per the Compound index documentation. While this is more thoroughly documented in relation to compound indexes, it is true of single-key indexes as well.

If you want to create a unique index against, for example, the "broadcaster" and "account_id" fields, I'd suggest removing the " _id" field from your unique indexes.

Regards,

Nick

Comment by Romain CAMPIGOTTO [ 08/Jun/18 ]

Thank you Ramon Fernandez for your quick answer.

No, I have not doc where all three keys are the same.
Perhaps I didn't understand, but what is exactly the difference between values 1 and -1 in indexes? I understood that -1 was a way to say "keep in the index, but do not consider it in the unique key". But I'm wrong!

Comment by Ramon Fernandez Marina [ 07/Jun/18 ]

zankosax, the two documents that you reference have a distinct _id field, so therefore both are allowed in the unique index.

Do you have some other docs where all three keys are the same?

Thanks,
Ramón.

Generated at Thu Feb 08 04:39:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.