[SERVER-15136] Duplicate _ids in production sharded cluster Created: 04/Sep/14  Updated: 18/Sep/14  Resolved: 04/Sep/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Michael Duminy Assignee: Unassigned
Resolution: Done Votes: 0
Labels: balancer, sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: Linux
Participants:

 Description   

We're picking up that there are duplicates through logs from the balancer. Please see the example error below (data specific details removed due to sensitive nature).

balancer move failed: { cause: { active: false, ns: "...", from: "...", min: { customer_id: ObjectId('...'), sk_customer_shard_group: 50 }, max: { customer_id: ObjectId('...'), sk_customer_shard_group: 50 }, shardKeyPattern: { customer_id: 1.0, sk_customer_shard_group: 1.0 }, state: "fail", errmsg: "cannot migrate chunk, local document { _id: ObjectId('...'), account_class: "...", account_id: ObjectId('...", counts: { cloned: 6189, clonedBytes: 26043128, catchup: 0, steady: 0 }, ok: 1.0 }, ok: 0.0, errmsg: "data transfer error" } from: secondset to: firstset chunk: min: { customer_id: ObjectId('...'), sk_customer_shard_group: 50 } max: { customer_id: ObjectId('...'), sk_customer_shard_group: 50 }

Finding by the 'local document' _id returns multiple results. So we have to run a script to de-dup the _id. We're using the C# driver and have recently updated it to the latest sub-version which includes an improvement to ObjectId generation, but the conflicting documents tend to be older data that is only picked up as the balancer moves chunks around.

I'm not sure how to proceed at this point. But I am scratching my head as to why duplicate _ids are present.



 Comments   
Comment by Michael Duminy [ 08/Sep/14 ]

Thanks Scott, the issue seems to have been in our application code where we expected some other field to be unique.

Comment by Scott Hernandez (Inactive) [ 04/Sep/14 ]

If your shard key is not _id, or includes _id as as a prefix of the shard key, then it is possible to have duplicate _id values in a sharded collection:
http://docs.mongodb.org/manual/reference/limits/#Unique-Indexes-in-Sharded-Collections
http://docs.mongodb.org/manual/faq/sharding/#how-does-mongodb-ensure-unique-id-field-values-when-using-a-shard-key-other-than-id

An example of where this doesn't work well, which you might be hitting is the following type of code, where you inadvertently change the shard key:

// shard collection by {a:1}
> coll.save({_id:1, a:25, b:1})
> var doc = coll.findOne({b:1})
// change values so that it results in a new shard owning the doc
> doc.a = 100;
// this will create a new document with the same _id value, on a different shard
> coll.save(doc);
// Now you have two documents in the collection with _id:1, on different shards
> coll.count({_id:1})
2

As noted the shard key must be treated as immutable, and server will not allow you to change it, but you must follow the same rules in your application code as well.
http://docs.mongodb.org/manual/core/sharding-shard-key/#considerations

Generated at Thu Feb 08 03:37:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.