[SERVER-15136] Duplicate _ids in production sharded cluster Created: 04/Sep/14 Updated: 18/Sep/14 Resolved: 04/Sep/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Michael Duminy | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | balancer, sharding | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | Linux |
| Participants: |
| Description |
|
We're picking up that there are duplicates through logs from the balancer. Please see the example error below (data specific details removed due to sensitive nature).
Finding by the 'local document' _id returns multiple results. So we have to run a script to de-dup the _id. We're using the C# driver and have recently updated it to the latest sub-version which includes an improvement to ObjectId generation, but the conflicting documents tend to be older data that is only picked up as the balancer moves chunks around. I'm not sure how to proceed at this point. But I am scratching my head as to why duplicate _ids are present. |
| Comments |
| Comment by Michael Duminy [ 08/Sep/14 ] | ||||||||||
|
Thanks Scott, the issue seems to have been in our application code where we expected some other field to be unique. | ||||||||||
| Comment by Scott Hernandez (Inactive) [ 04/Sep/14 ] | ||||||||||
|
If your shard key is not _id, or includes _id as as a prefix of the shard key, then it is possible to have duplicate _id values in a sharded collection: An example of where this doesn't work well, which you might be hitting is the following type of code, where you inadvertently change the shard key:
As noted the shard key must be treated as immutable, and server will not allow you to change it, but you must follow the same rules in your application code as well. |